pedronachtigall / CodAn

CDS prediction in transcripts
Other
23 stars 5 forks source link

not generating minus.fa.index file? #1

Closed fka21 closed 5 years ago

fka21 commented 5 years ago

Dear developers,

I saw your tool on biorxiv and it sounds promising so I wanted to try it out. After basic usage on my own data I get the following error message:

2019-10-09 12:33:51 >>>> starting CodAn (v1.0 September 2019)...

      transcript file -> merged/Pcau_transcriptome2.fasta
      model -> ../../CodAn/models/
      strand prediction -> both
      number of threads -> 1

2019-10-09 12:33:51 >>>> CDS prediction... 2019-10-09 12:33:52 >>>> retrieving sequences... number of transcripts -> 39468 number of predictions -> 0 predictions at plus strand -> 0 predictions at minus strand -> 0 Traceback (most recent call last): File "/Users/ferenckagan/Documents/Bioinformatic_analysis/CodAn/codan/bin/codan.py", line 478, in main() File "/Users/ferenckagan/Documents/Bioinformatic_analysis/CodAn/codan/bin/codan.py", line 462, in main _codanBOTH(options.transcripts, options.output_folder, options.model, options.cpu) File "/Users/ferenckagan/Documents/Bioinformatic_analysis/CodAn/codan/bin/codan.py", line 335, in _codanBOTH _retrieveORFBOTH(transcripts, outF+"minus.fa", outF) File "/Users/ferenckagan/Documents/Bioinformatic_analysis/CodAn/codan/bin/codan.py", line 274, in _retrieveORFBOTH os.remove(outF+"minus.fa.index") FileNotFoundError: [Errno 2] No such file or directory: './minus.fa.index'

I have the dependencies installed (except BioPerl since the current version relies on installing specific modules and I do not know which are required for CodAn). I have unzipped the invertebrate models in their own directory and left them there without the rest of the models. Probably I missed something during the setup. Thank you for your feedback.

pedronachtigall commented 5 years ago

Hi @fka21 ,

Thank you for considering to use CodAn in your analysis.

As I noticed it is not performing any predictions in the prediction step. Just to know, did it generated any output file?

You need to ensure that all of these modules are installed and properly working with perl: use File::Basename; use strict; use warnings; use Data::Dumper; use Getopt::Long; use Bio::SeqIO; use Bio::DB::Fasta; use Cwd 'abs_path'; use Digest::MD5 'md5_hex'; use FileHandle; use IPC::Open2; use MCE; use MCE::Mutex; use File::Basename 'dirname';

I will add them as "requirements" in the README.

Try to install all of these modules and run the prediction. Let me know if it works or if you are still having troubles.

Best regards, Pedro

fka21 commented 5 years ago

Dear Pedro,

Thank you for your answer, I tried to install manually all of the modules using cpanm but the BioPerl ones seem to have issues since cpanm won't install their dependencies. I tried to manually install them through cpanm again but then I got another error message, that the dependencies of the dependencies did no install. I will try to resolve this issue and try out your tool to see if the missing modules were the original culprit. Will keep you updated.

Best regards, Ferenc

pedronachtigall commented 5 years ago

Dear Ferenc (@fka21 ),

Did you make CodAn run on your system? I made a try of CodAn in a clean ubuntu (I used a docker with a clean ubuntu 18). It only needed to install Biopython, Bioperl and MCE libraries with the following command: apt-get install python3-biopython bioperl libmce-perl, and then installed CodAn following our instructions. In this scenario, CodAn run and performed the predictions. So, try to run the command apt-get install python3-biopython bioperl libmce-perl in your terminal, or try using a docker with ubuntu 18 and follow the instructions state here. Let me know if it worked or if you need any help. I will be happy to help you!

Best regards, Pedro

fka21 commented 5 years ago

Dear Pedro,

I tried what you suggested on my own ubuntu 18. Everything got installed accordingly but I still get the same error message. Although it is true that the ubuntu is not clean, but I use conda for softwares and packages and I get the same error message even if I don't have conda environment activated. Do you have any other suggestions?

Appreciate the help!

Best regards, Ferenc

ayoshiaki commented 5 years ago

Dear Ferenc (@fka21 ),

CodAn is composed by three individual programs: (i) tops-viterbi_decoding; (ii) predict; and (iii) codan.py.

We can try to isolate the issue in your installation by running each program individually.

Can you execute each program individually and send us the output?

Just run the following program, without any parameter.

predict

tops-viterbi_decoding

codan.py

If a library is missing in your installation, then an error will appear.

Best regards, Andre

alandurham commented 5 years ago

It seems to me you just forgot to put the correct path for the model it should be ../../CodAn/Models/INV if your invertebrate model is in directory INV....

fka21 commented 5 years ago

Dear all,

Thank you for the many suggestions. @ayoshiaki I tried to run the scripts without any parameters and the predict one gave me an error message:

Can't locate Bio/SeqIO.pm in @INC (you may need to install the Bio::SeqIO module) (@INC contains: /home/ferenc/miniconda3/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/ferenc/miniconda3/lib/site_perl/5.26.2 /home/ferenc/miniconda3/lib/5.26.2/x86_64-linux-thread-multi /home/ferenc/miniconda3/lib/5.26.2 .) at CodAn/codan/bin/predict line 8.

I have the SeqIO.pm on my system and set the correct path towards it with:

sudo find / -name "SeqIO.pm" -> /usr/share/perl5/Bio/SeqIO.pm

export PERL5LIB=/usr/share/perl5

Afterwards the predict script was working properly, and that holds true to the tops-viterbi_decoding too. But with codan.py I get the same error message as before when I try basic usage. @alandurham Also thank you for the correction, now I appropriately set the full path towards the directories but it won't help unfortunately.

Best regards, Ferenc

ayoshiaki commented 5 years ago

Dear Ferenc (@fka21),

We have written a tutorial explaining the usage, and I hope that it can help you.

https://github.com/pedronachtigall/CodAn/tree/master/tutorial

Sometimes, codAn can have problems processing Fasta files with headers that contain symbols such as ":", and "|". Can you send us the first three headers of your Fasta file? You can do it by executing the following command.

cat  Pcau_transcriptome2.fasta | grep ">" | head -3

best regards, André.

fka21 commented 5 years ago

Dear Andre,

I appreciate the persistent help! I don't see any of the symbols mentioned in the headers, here are the first 3 headers:

">PcaudatusEvg000003t1" ">PcaudatusEvg000010t1" ">PcaudatusEvg011602t1"

Do you have any test data perhaps?

Best regards, Ferenc

ayoshiaki commented 5 years ago

Hi Ferenc,

I have more ideas to try to help you.

The Bio::DB::Fasta library is responsible for creating the .index. Maybe the error is related to this package. It can't process a fasta file with lines containing more than 65,536 characters.

  1. Check if your system has it installed.
  2. Check if your fasta contains huge lines greater than 65,536 of length.

Another idea is to put both the model folder (INV_partial or INV_full) and the Pcau_transcriptome2.fasta in the same folder that you execute the codan.py program.

codan.py -t Pcau_transcriptome2.fasta -m INV_partial

Our tutorial folder has an example of test data.

Andre.

pedronachtigall commented 5 years ago

Hi Fenrec (@fka21 ),

Following Andre's commentaries, you should try to run our quick tutorial. In the tutorial, you will find a test sequence and guidelines to run codan.

Moreover, considering that huge sequences can break the Bio::DB::Fasta library, and, generally, Trinity (and other de novo assemblers) write the output sequences in the same line, which can led to huge sequences in one line, I add a script to break lines into 100 nucleotides length per line. Do the following:

Best regards, Pedro

fka21 commented 5 years ago

Dear Andre and Pedro,

I have tried both of your suggestions. I know for sure that I have the required package by checking with perldoc -l Bio::DB::Fasta, I also manually added it to the INC path just to be sure it is loaded. Furthermore I tried the script kindly provided by Pedro. I used the output of it (also checked if it has only 100 character lines max and it did), also moved everything into one directory but despite all of these efforts I still get the same error message.

Also tried out the tutorial but I get the same error message.

Best regards, Ferenc

alandurham commented 5 years ago

Dear Ferenc,

We are all puzzled at your problem. I have just redownloaded codan arnd ran it again, with no problem. However, whatever the problem we are sure it can happen to other people and we are grateful for your patience. To really understand what is happening I suggest a google or skype hangout, so, by screen sharing we can see what is happening and try things right away. That should be MUCH faster. What do you think? You can send me email at aland@usp.br and we will set up the meeting.

RhettRautsaw commented 4 years ago

I also just got this message and was able to solve it by creating the following conda environment. conda create -n venomancer_env python=3.7 biopython perl perl-bioperl perl-mce blast