ropensci / EDIutils

An API Client for the Environmental Data Initiative Repository
https://docs.ropensci.org/EDIutils/
Other
10 stars 2 forks source link

detect_delimeter.R and "inconsistent number of field delimiters #10

Closed yvanlebras closed 4 years ago

yvanlebras commented 4 years ago

Dear EDI,

Using EML Assembly Line through Metashark, @earnaud , I have an issue trying importing a tab separated datafile:

Templating table attributes ...
Warning: Error in detect_errors: occurrence.txt contains an inconsistent number of field delimeters. The correct number of field delimiters for this table appears to be 237. Deviation from this occurs at rows: 141, 209, 520, 1056, 1058, 1059, 1071, 1119, 1120, 1121, 1132, 1140, 1142, 1147, 1153, 1157, 1165, 1185, 1189, 1219, 1238, 3852, 3932 ... Check the number of field delimiters in these rows. All rows of your table must contain a consistent number of fields.

This file is an occurence.txt file coming from GBIF and apparently, the number of delimiters, so \t, is ok looking at the content of the file through a text editor, notably on lines 140, 141, 142. occurrence.txt

Not fully sure detect_errors error message is from EDIutils, but it seems to me that it come from R/validate_fields.R isn't it ?

earnaud commented 4 years ago

By trying the following chunk of codes, I detected the issue seem to be in count.fields() used in validate_arguments. Without the quote argument, the table returns loads of inconsistent-length read lines.

> cf <- count.fields("~/dataPackagesOutput/occurrence.txt", get.delim("~/dataPackagesOutput/occurrence.txt"), quote = c("\"","\'"))
> as.data.frame(table(cf))
   cf  Freq
1 107     1
2 125    22
3 237 10158
clnsmth commented 4 years ago

Thanks for reporting this issue @yvanlebras and for you insights @earnaud!

I've fixed this issue in EMLassemblyline branch fix_41. Please use this branch for your MetaShARK work until it is merged into the development then master branches.