lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
29 stars 17 forks source link

cannnot determine start/end columns when GFF attributes contain start and/or end #33

Open nathanhaigh opened 4 years ago

nathanhaigh commented 4 years ago

I have a GFF file derived from GeMoMa output. When these GFF files are loaded with import(), the following error occurs:

Error in .find_start_end_cols

I believe the issue lies with the fact that GeMoMa outputs GFF attributes with the name start and stop to indicate the start and stop codons identified/detected. As such, multiple columns are called start and stop and so the above error is thrown.

lawremi commented 4 years ago

Thanks for the report. Given that makeGRangesFromDataFrame() is a heuristic, we could change it to select the first columns named "start" or "end" as the columns defining the ranges. Or, given that readGFF() already knows which columns define the ranges, it could construct the GRanges directly, without any ambiguity. Of course, you probably want to rename the "start" and "end" metadata columns after importing just to avoid confusion. I'll defer to @hpages who is the current author of the GFF parser.

nathanhaigh commented 4 years ago

you probably want to rename the "start" and "end" metadata columns after importing just to avoid confusion.

Indeed, my workaround is to rename them start_codon and end_codon respectively.