jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

correct_gff_feature_order.pl doesn't work #52

Closed arsilan324 closed 6 years ago

arsilan324 commented 6 years ago

Hello,

When I run the script correct_gff_feature_order.pl, I get this error

Can't locate bioUtils.pm in @INC (you may need to install the bioUtils module) (@INC contains: /Users/arslan/Documents/Juncus/EMBL/EMBLmyGFF3/../lib /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2/darwin-thread-multi-2level /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 .) at correct_gff_feature_order.pl line 76.
BEGIN failed--compilation aborted at correct_gff_feature_order.pl line 76.

Can you please comment how can I fix? Thanks

jorvis commented 6 years ago

This is one of the few perl scripts left (which also rely on the bioUtils.pm module). Did you install biocode via pip? Within that python framework I can't properly place the perl module. You should be able to run this if you check out biocode instead from GitHub and then make sure biocode/lib/ is in your PERL5LIB env variable. Or copy bioUtils.pm from a checkout into a directory where perl will see it.

bernt-matthias commented 6 years ago

Hi I'm a colleague of @arsilan324 . Got it running on my computer (thanks for your explanations), but ran into other problems:

panic! don't know what to do with feat type: 3'UTR at ./gff/correct_gff_feature_order.pl line 186, <$ifh> line 6.

found more than one gene at position 1 on molecule Transcript_100004

The gff3 looks as follows

Transcript_100004       transdecoder    gene    1       813     .       +       .       ID=Transcript_100004|g.119957;Name=ORF%20Transcript_100004%7Cg.119957%20Transcript_100004%7Cm.119957%20type%3A5prime_partial%20len%3A200%20%28%2B%29
Transcript_100004       transdecoder    mRNA    1       813     .       +       .       ID=Transcript_100004|m.119957;Parent=Transcript_100004|g.119957;Name=ORF%20Transcript_100004%7Cg.119957%20Transcript_100004%7Cm.119957%20type%3A5prime_partial%20len%3A200%20%28%2B%29
Transcript_100004       transdecoder    CDS     1       600     .       +       .       ID=cds.Transcript_100004|m.119957;Parent=Transcript_100004|m.119957
Transcript_100004       transdecoder    exon    1       813     .       +       .       ID=Transcript_100004|m.119957.exon1;Parent=Transcript_100004|m.119957
Transcript_100004       transdecoder    3'UTR   601     813     .       +       .       ID=Transcript_100004|m.119957.utr3p1;Parent=Transcript_100004|m.119957

Transcript_100004       transdecoder    gene    1       813     .       -       .       ID=Transcript_100004|g.119958;Name=ORF%20Transcript_100004%7Cg.119958%20Transcript_100004%7Cm.119958%20type%3Acomplete%20len%3A128%20%28-%29
Transcript_100004       transdecoder    mRNA    1       813     .       -       .       ID=Transcript_100004|m.119958;Parent=Transcript_100004|g.119958;Name=ORF%20Transcript_100004%7Cg.119958%20Transcript_100004%7Cm.119958%20type%3Acomplete%20len%3A128%20%28-%29
Transcript_100004       transdecoder    CDS     322     705     .       -       .       ID=cds.Transcript_100004|m.119958;Parent=Transcript_100004|m.119958
Transcript_100004       transdecoder    exon    1       813     .       -       .       ID=Transcript_100004|m.119958.exon1;Parent=Transcript_100004|m.119958
Transcript_100004       transdecoder    5'UTR   706     813     .       -       .       ID=Transcript_100004|m.119958.utr5p1;Parent=Transcript_100004|m.119958
Transcript_100004       transdecoder    3'UTR   1       321     .       -       .       ID=Transcript_100004|m.119958.utr3p1;Parent=Transcript_100004|m.119958

I guess there should be only one gene with two child transcripts. Then this would be a bug in the upstream software that produced the gff file.?

jorvis commented 6 years ago

I don't mind modifying it, but in legal GFF3 that third column is supposed to correspond to a Sequence Ontology (SO) term. The parent is UTR but there are also five_prime_UTR and three_prime_UTR. You'd have to change your input file to have those feature types instead. It wouldn't hurt to make a transdecoder ticket too and tell Brian to make his GFF right. :)

jorvis commented 6 years ago

Supported added for these two types in commit 57e92bd

bernt-matthias commented 6 years ago

great. thanks