nextgenusfs / amptk

AMPtk: Amplicon ToolKit for NGS data (formally UFITS)
http://amptk.readthedocs.io/
BSD 2-Clause "Simplified" License
38 stars 14 forks source link

Error when run quick start #103

Open CourierFlag opened 1 year ago

CourierFlag commented 1 year ago

I was learning amptk, and ran AMPtk Quick Start guidling to process test data from and I got this error when filtering. ################ Traceback (most recent call last): File "/home/user1/miniconda3/envs/amptk154/bin/amptk", line 10, in sys.exit(main()) File "/home/user1/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/amptk.py", line 784, in main mod.main(arguments) File "/home/user1/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/filter.py", line 300, in main filt2 = filtered.loc[(filtered != 0).any(1)] TypeError: any() takes 1 positional argument but 2 were given ################# Same error was recurrent when I re-install from pip or conda, or using different version of amptk, including V 1.5.4 and V 1.5.5, I also try different version of python, like 3.10 and 3.8, the error still occur. I re-wrote the python script "site-packages/amptk/filter.py" at line 300. change filt2 = filtered.loc[(filtered != 0).any(1)] to filt2 = filtered.loc[(filtered != 0).any(axis=1)] then the filter was run successfully. but, when I ran amptk taxonomy after LULU, I get another problem, ############# [03:51:57 PM]: OS: Ubuntu 20.04, 8 cores, ~ 25 GB RAM. Python: 3.8.18 [03:51:57 PM]: AMPtk v1.5.4, VSEARCH v2.24.0 [03:51:57 PM]: Loading FASTA Records [03:51:57 PM]: 5 OTUs [03:51:57 PM]: Global alignment OTUs with usearch_global (VSEARCH) against ITS.udb [03:51:57 PM]: Classifying OTUs with SINTAX (VSEARCH) [03:51:57 PM]: SINTAX results empty [03:51:57 PM]: Parsing taxonomy failed -- see logfile ################### there are no output result!!! I checked log file. It shows below: #################### [11/02/23 15:51:56]: /home/user1/miniconda3/envs/amptk154/bin/amptk taxonomy -f miseq.lulu.otus.fa -i miseq.lulu.otu_table.txt -m miseq.mapping_file.txt -d ITS2 -o miseq

[11/02/23 15:51:57]: OS: Ubuntu 20.04, 8 cores, ~ 25 GB RAM. Python: 3.8.18 [11/02/23 15:51:57]: Python Modules: numpy v1.24.4, pandas v2.0.3, matplotlib v3.4.3, psutil v5.9.5, natsort v8.4.0, biopython v1.81, edlib v1.3.9, biom-format v2.1.15 [11/02/23 15:51:57]: AMPtk v1.5.4, VSEARCH v2.24.0 [11/02/23 15:51:57]: Loading FASTA Records [11/02/23 15:51:57]: 5 OTUs [11/02/23 15:51:57]: Global alignment OTUs with usearch_global (VSEARCH) against ITS.udb [11/02/23 15:51:57]: vsearch --usearch_global miseq.lulu.otus.fa --db /home/user1/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/DB/ITS.udb --userout miseq.usearch.txt --id 0.7 --strand both --output_no_hits --maxaccepts 500 --top_hits_only --userfields query+target+id --notrunclabels --threads 8 [11/02/23 15:51:57]: vsearch v2.24.0_linux_x86_64, 23.9GB RAM, 8 cores https://github.com/torognes/vsearch

Fatal error: Unable to get status for input file (/home/user1/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/DB/ITS.udb)

[11/02/23 15:51:57]: Classifying OTUs with SINTAX (VSEARCH) [11/02/23 15:51:57]: vsearch --sintax miseq.lulu.otus.fa --db /home/user1/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/DB/ITS2_SINTAX.udb --tabbedout miseq.sintax.txt -sintax_cutoff 0.8 --threads 8 --notrunclabels [11/02/23 15:51:57]: vsearch v2.24.0_linux_x86_64, 23.9GB RAM, 8 cores https://github.com/torognes/vsearch

Fatal error: Unable to get status for input file (/home/user1/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/DB/ITS2_SINTAX.udb)

[11/02/23 15:51:57]: SINTAX results empty [11/02/23 15:51:57]: Global alignment results parsed, resulting in 0 taxonomy predictions [11/02/23 15:51:57]: Combined OTU taxonomy dictionary contains 0 taxonomy predictions [11/02/23 15:51:57]: Parsing taxonomy failed -- see logfile` ################ sameerror was also occur when I ran with amptk V 1.5.5

nextgenusfs commented 1 year ago

It looks like the ITS database is either corrupt or not installed properly. What is the output of amptk info? ie for me:

$ amptk info
------------------------------
Running AMPtk v 1.6.0
------------------------------
Taxonomy Databases Installed: /Users/jon/miniconda3/envs/amptk/lib/python3.7/site-packages/amptk/DB
------------------------------
  DB_name   DB_type                         FASTA                         Fwd Primer Rev Primer Records Source Version     Date   
        16S vsearch                                rdp_16s_v16.kingdom.fa     None        None    13118   RDP       v16 2019-02-18
 16S_SINTAX  sintax                                rdp_16s_v16.kingdom.fa    515FB       806RB     9679   RDP       v16 2019-02-18
        COI vsearch                      arth-chord.bold.reformated.fasta  LCO1490   mlCOIintR  1617885  BOLD  20190219 2019-02-19
 COI_SINTAX  sintax                           arth-chord.bold.fixed.fasta  LCO1490   mlCOIintR   381032  BOLD  20190219 2020-09-14
        ITS vsearch                     UNITE_public_all_29.11.2022.fasta   ITS1-F        ITS4  6484445 UNITE       9.3 2023-03-22
ITS1_SINTAX  sintax sh_general_release_dynamic_s_all_29.11.2022_dev.fasta   ITS1-F        ITS2   258465 UNITE       9.3 2023-03-22
ITS2_SINTAX  sintax sh_general_release_dynamic_s_all_29.11.2022_dev.fasta    fITS7        ITS4   231038 UNITE       9.3 2023-03-22
 ITS_SINTAX  sintax sh_general_release_dynamic_s_all_29.11.2022_dev.fasta   ITS1-F        ITS4   290642 UNITE       9.3 2023-03-22
        LSU vsearch                                     RDP_v8.0_fungi.fa     None        None    91823   RDP         8 2019-02-12
 LSU_SINTAX  sintax                                     RDP_v8.0_fungi.fa     None        None    91823   RDP         8 2019-02-12
        PR2 vsearch                     pr2_version_4.14.0_SSU_UTAX.fasta    616-f       1132r   197106   PR2    4.14.0 2021-11-25
 PR2_SINTAX  sintax                     pr2_version_4.14.0_SSU_UTAX.fasta    616-f       1132r    99991   PR2    4.14.0 2021-11-25
------------------------------
CourierFlag commented 1 year ago

It looks like the ITS database is either corrupt or not installed properly. What is the output of amptk info? ie for me:

$ amptk info
------------------------------
Running AMPtk v 1.6.0
------------------------------
Taxonomy Databases Installed: /Users/jon/miniconda3/envs/amptk/lib/python3.7/site-packages/amptk/DB
------------------------------
  DB_name   DB_type                         FASTA                         Fwd Primer Rev Primer Records Source Version     Date   
        16S vsearch                                rdp_16s_v16.kingdom.fa     None        None    13118   RDP       v16 2019-02-18
 16S_SINTAX  sintax                                rdp_16s_v16.kingdom.fa    515FB       806RB     9679   RDP       v16 2019-02-18
        COI vsearch                      arth-chord.bold.reformated.fasta  LCO1490   mlCOIintR  1617885  BOLD  20190219 2019-02-19
 COI_SINTAX  sintax                           arth-chord.bold.fixed.fasta  LCO1490   mlCOIintR   381032  BOLD  20190219 2020-09-14
        ITS vsearch                     UNITE_public_all_29.11.2022.fasta   ITS1-F        ITS4  6484445 UNITE       9.3 2023-03-22
ITS1_SINTAX  sintax sh_general_release_dynamic_s_all_29.11.2022_dev.fasta   ITS1-F        ITS2   258465 UNITE       9.3 2023-03-22
ITS2_SINTAX  sintax sh_general_release_dynamic_s_all_29.11.2022_dev.fasta    fITS7        ITS4   231038 UNITE       9.3 2023-03-22
 ITS_SINTAX  sintax sh_general_release_dynamic_s_all_29.11.2022_dev.fasta   ITS1-F        ITS4   290642 UNITE       9.3 2023-03-22
        LSU vsearch                                     RDP_v8.0_fungi.fa     None        None    91823   RDP         8 2019-02-12
 LSU_SINTAX  sintax                                     RDP_v8.0_fungi.fa     None        None    91823   RDP         8 2019-02-12
        PR2 vsearch                     pr2_version_4.14.0_SSU_UTAX.fasta    616-f       1132r   197106   PR2    4.14.0 2021-11-25
 PR2_SINTAX  sintax                     pr2_version_4.14.0_SSU_UTAX.fasta    616-f       1132r    99991   PR2    4.14.0 2021-11-25
------------------------------

I installed the data base, and the last problem was solved. But I wonder is it a correct solution for the first issue?

nextgenusfs commented 1 year ago

Sorry I didn't notice the first issue -- the formatting was all blended together. Can you rephrase the first issue and include commands you used to generate those errors?

CourierFlag commented 1 year ago

Sorry I didn't notice the first issue -- the formatting was all blended together. Can you rephrase the first issue and include commands you used to generate those errors?

Sorry I didn't notice the first issue -- the formatting was all blended together. Can you rephrase the first issue and include commands you used to generate those errors?

I'm sorry for my confusing reply, and thank you for your help.

Here is the first issue. I used commands below to filter: amptk filter -i miseq.otu_table.txt -f miseq.cluster.otus.fa -b spike -m mock2

and error showed like

[02:56:43 PM]: OS: Ubuntu 20.04, 8 cores, ~ 25 GB RAM. Python: 3.8.18 [02:56:43 PM]: AMPtk v1.5.4, VSEARCH v2.24.0 [02:56:43 PM]: Loading OTU table: miseq.otu_table.txt [02:56:43 PM]: OTU table contains 3 samples, 21 OTUs, and 315 reads counts [02:56:43 PM]: Mapping OTUs to Mock Community (USEARCH) [02:56:44 PM]: 8 mock missing: mock7, mock8, mock11, mock15, mock16, mock17, mock18, mock24 [02:56:44 PM]: Sorting OTU table naturally Traceback (most recent call last): File "/home/dxw/miniconda3/envs/amptk154/bin/amptk", line 10, in sys.exit(main()) File "/home/dxw/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/amptk.py", line 784, in main mod.main(arguments) File "/home/dxw/miniconda3/envs/amptk154/lib/python3.8/site-packages/amptk/filter.py", line 300, in main filt2 = filtered.loc[(filtered != 0).any(1)] TypeError: any() takes 1 positional argument but 2 were given

I tried to fix it by modifying "site-packages/amptk/filter.py" file at 300 line changed filt2 = filtered.loc[(filtered != 0).any(1)] to filt2 = filtered.loc[(filtered != 0).any(axis=1)] then it seems ran successfully.

mjusino commented 7 months ago

I am having this same issue in the filtering step with AMPtk ver 1.6.0, though I think it might be an M3 mac issue? (there seem to be a lot of those, I have to use a rosetta env to get most things to work) I changed line 300 in filter.py as suggested by the user above and the filtering seemed to work, though I did get these 2 "future warnings".

[Apr 03 12:01 PM]: OS: MacOSX 14.4.1, 16 cores, ~ 67 GB RAM. Python: 3.12.2 [Apr 03 12:01 PM]: AMPtk v1.6.0, VSEARCH v2.27.0 ... [Apr 03 12:01 PM]: Sorting OTU table naturally [Apr 03 12:01 PM]: Overwriting auto detect index-bleed, setting to 0.500000% /Users/michelle/miniconda3/envs/rosetta/lib/python3.12/site-packages/amptk/filter.py:559: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] if row[i] == 0: /Users/michelle/miniconda3/envs/rosetta/lib/python3.12/site-packages/amptk/filter.py:560: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] merge[index].append(row[i])

nextgenusfs commented 7 months ago

its probably just a pandas issue, try to downgrade to something less than v2.0 and see if that works, ie python -m pip install "pandas<2.0". Fancy M3! Still using this 2014 MacBook Pro -- although its def on its last legs!

mjusino commented 7 months ago

That worked - the future warnings are gone. It is funny, I double checked to make sure pandas was updated before posting - I was running pandas v2.2.1 before for reference, and am now down to v1.5.3 I still have my 2014 MacBook Pro, and really prefer it but trying to give it a much needed break...