stheil15 / virAnnot

1 stars 4 forks source link

some conflicts and still can not run #3

Open poursalavati opened 3 years ago

poursalavati commented 3 years ago

Hi, I'm trying to run this tool on our HPC. But unfortunately, there are still problems (I fixed some of them that I mentioned below. Maybe the code needs to be modified):

1- This download address has changed, please replace it:

ftp://ftp.ncbi.nih.gov/pub/taxonomy/obsolete/gi_taxid_prot.dmp.gz

(instead of: ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_prot.dmp.gz)

2- Errors related to loadTaxonomy.pl execution:

"dead_prot=s"=> \$data_dead_acc_prot,
"dead_nucl=s"=> \$data_dead_acc_nucl,

./loadTaxonomy.pl -struct taxonomyStructure.sql -index taxonomyIndex.sql -acc_prot acc2taxid.prot -acc_nucl acc2taxid.nucl -names names.dmp -nodes nodes.dmp -gi_prot gi_taxid_prot.dmp -acc_wgs acc2taxid.nucl -dead_nucl dead_nucl.accession2taxid -dead_prot dead_prot.accession2taxid And this is the messages you receive:

2021/06/26 12:23:20  INFO> loadTaxonomy.pl-bac:122 main::_create_sqlite_db - Creating database.
2021/06/26 12:23:22  INFO> loadTaxonomy.pl-bac:78 main::_insertingCSVDataInDatabase - Inserting tables into database...
2021/06/26 12:23:22  INFO> loadTaxonomy.pl-bac:80 main::_insertingCSVDataInDatabase - nodes
2021/06/26 12:23:22  INFO> loadTaxonomy.pl-bac:80 main::_insertingCSVDataInDatabase - nucl_accession2taxid
2021/06/26 12:23:22  INFO> loadTaxonomy.pl-bac:80 main::_insertingCSVDataInDatabase - prot_accession2taxid
2021/06/26 12:23:22  INFO> loadTaxonomy.pl-bac:80 main::_insertingCSVDataInDatabase - names
2021/06/26 12:23:22  INFO> loadTaxonomy.pl-bac:80 main::_insertingCSVDataInDatabase - gi_prot

But unfortunately, after fixing all the cases, the taxonomy.tmp.sqlite still has 80 kb!

3- The other fix is about the PFAM taxonomy in the manual.

This section needs to be modified: mkdir pfam should be mkdir fasta

And unfortunately after executing this code:

ls -1 pfam*.FASTA | sed 's,^\(.*\)\.FASTA,./gi2taxonomy.pl -i & -o \1.tax.txt -db taxonomy.tmp.sqlite -r,' | bash Gives too many of these error messages:

WARN - tax_id not found for gi: ########

This is probably due to a problem with the taxonomy.tmp.sqlite in the previous section, which was not fully created.

Thank you for your help in resolving this issue, and make changes if the code needs to be modified. Sincerely yours, Naser

marieBvr commented 3 years ago

Hi Naser, Thank you for pointing out all these issues. It seems you are using the new documentation but posting the issue on an old repository. I suggest you use the current project at https://github.com/marieBvr/virAnnot. Be careful to use the slurm-branch if your server uses slurm.

Nevertheless, I will check and update the current project and its documentation thanks to your suggestions

Let me know (on the other repository) if you still face issues. Sincerely yours, Marie