soedinglab / metaeuk

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics
GNU General Public License v3.0
174 stars 23 forks source link

Does MetaEuk provide official test cases? #50

Closed YFLEEL closed 1 year ago

YFLEEL commented 2 years ago

Excuse me, does MetaEuk provide official test cases?

elileka commented 2 years ago

Hi,

For our internal use, we have several very small test cases. These are mainly some sequences we edited and cropped to capture the scenarios, which are relevant for us (for example, putting together a gene with several exons). We could provide more details if you can tell us what it is exactly you're interested in :)

YFLEEL commented 2 years ago

Thank you very much for your reply! I am deploying the application on a new system, and I need to use test cases to test its full functional implementation. I would like to know if you can provide test cases here. Thanks again for your reply:)

milot-mirdita commented 2 years ago

You can call from within the repository:

git submodule update --init
cd tests
./run.sh full-path-to/metaeuk

MMseqs2 has much more comprehensive tests and is used in MetaEuk.

To run the MMseqs2 tests you do the following from within the MMseqs2 repository:

git submodule update --init
./util/regression/run_regression.sh full-path-to/mmseqs2 SCRATCH_DIR
YFLEEL commented 2 years ago

Okay, i'll try to implement it, thanks!

YFLEEL commented 2 years ago

Hello, sorry to trouble you again. After I ran run.sh (/test/run.sh) as you said last time, I got the following error: Failed at all predictions: number of predictions has changed! After running, the test results corresponding to the three datasets are generated in the folder, which are as follows: minus_strand ---- minus_strand_results multi_exon ---- multi_exon_results two_contigs ---- two_contigs_results. When I run the test.sh (/test/test.sh) file separately to test each data set separately, this error also occurs. I wonder if there is a good solution? Thanks!

elileka commented 2 years ago

Hi,

It seems odd because these tests run every time as part of our commit procedure so we wouldn't have a working commit if any of them failed...

Can you perhaps check the output that is produced for one of the folders? I suspect the output doesn't exist or is empty (hence the number of predictions has changed). If that is indeed what happened then the problem is in running MetaEuk on the example prior to the test.

YFLEEL commented 2 years ago

Hi,After I just rebuilt it once, the previous error was gone, but I encountered a new problem. After running run.sh, the tests of the first seven datasets passed, and when running to the taxRes_tax_per_contig.tsv of sacc_tax, it suddenly terminated.The code execution stops at the following position.What might be the reason for this problem?

245B5B978FD65FF082F1D1E2F52A20E2

MMseqs Version: 1da320a9daa75dce5539442b5674f69951a2fe4f
Verbosity   3

Time for processing: 0h 0m 0s 0ms
aggregatetaxweights sacc_tax/swissProtSomeClasses sacc_tax_results/tmpDir/12008456911185907304/preds_map_num_swapped sacc_tax_results/tmpDir/12008456911185907304/tax_per_pred sacc_tax_results/tmpDir/12008456911185907304/tax_per_pred_aln sacc_tax_results/tmpDir/12008456911185907304/tax_per_contig --tax-lineage 1 --compressed 0 --threads 4 -v 3 

MMseqs Version:                 1da320a9daa75dce5539442b5674f69951a2fe4f
Majority threshold              0.5
Vote mode                       1
LCA ranks                       
Column with taxonomic lineage   1
Compressed                      0
Threads                         4
Verbosity                       3

Loading NCBI taxonomy
Loading nodes file ... Done, got 13938 nodes
Loading merged file ... Done, added 0 merged nodes.
Loading names file ... Done
Init RMQ ...Done
[=================================================================] 100.00% 1 eta -
Time for merging to tax_per_contig: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 24ms
createtsv sacc_tax_results/tmpDir/12008456911185907304/preds sacc_tax_results/tmpDir/12008456911185907304/tax_per_pred sacc_tax_results/taxRes_tax_per_pred.tsv 

MMseqs Version:                     1da320a9daa75dce5539442b5674f69951a2fe4f
First sequence as representative    false
Target column                       1
Add full header                     false
Sequence source                     0
Database output                     false
Threads                             4
Compressed                          0
Verbosity                           3

Time for merging to taxRes_tax_per_pred.tsv: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 2ms
createtsv sacc_tax_results/tempFolder/latest/contigs sacc_tax_results/tmpDir/12008456911185907304/tax_per_contig sacc_tax_results/taxRes_tax_per_contig.tsv 

MMseqs Version:                     1da320a9daa75dce5539442b5674f69951a2fe4f
First sequence as representative    false
Target column                       1
Add full header                     false
Sequence source                     0
Database output                     false
Threads                             4
Compressed                          0
Verbosity                           3

Time for merging to taxRes_tax_per_contig.tsv: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 1ms
+ grep Saccharomyces sacc_tax_results/taxRes_tax_per_contig.tsv
NC_001133.9 4932    species Saccharomyces cerevisiae    19  19  17  0.890   -_cellular organisms;d_Eukaryota;-_Opisthokonta;k_Fungi;-_Dikarya;p_Ascomycota;-_saccharomyceta;-_Saccharomycotina;c_Saccharomycetes;o_Saccharomycetales;f_Saccharomycetaceae;g_Saccharomyces;s_Saccharomyces cerevisia
YFLEEL commented 2 years ago

Hello, sorry to trouble you again. I used run.sh for functional testing. When I ran to the sacc_tax test, it was interrupted in the middle of the test, and the system did not report an error message. But the other test sets in the front pass all the tests. I would like to ask, if I throw away this dataset and leave only other successful datasets, is it guaranteed that all functional tests will be successful? Or, if the test set of sacc_tax fails, it means that there is a problem with the implementation of some functions, and it needs to be solved?

milot-mirdita commented 2 years ago

Was the metaeuk process maybe killed by the OOM-killer? You can check with dmesg -T | egrep -i 'killed process'

None of the tests should fail. Please post the complete output of the test run, maybe we can find a hint to what went wrong.

YFLEEL commented 2 years ago

Hi, the code execution stops at the following position.

截屏2022-09-16 13 41 21
MMseqs Version: 1da320a9daa75dce5539442b5674f69951a2fe4f
Verbosity   3

Time for processing: 0h 0m 0s 0ms
aggregatetaxweights sacc_tax/swissProtSomeClasses sacc_tax_results/tmpDir/12008456911185907304/preds_map_num_swapped sacc_tax_results/tmpDir/12008456911185907304/tax_per_pred sacc_tax_results/tmpDir/12008456911185907304/tax_per_pred_aln sacc_tax_results/tmpDir/12008456911185907304/tax_per_contig --tax-lineage 1 --compressed 0 --threads 4 -v 3 

MMseqs Version:                 1da320a9daa75dce5539442b5674f69951a2fe4f
Majority threshold              0.5
Vote mode                       1
LCA ranks                       
Column with taxonomic lineage   1
Compressed                      0
Threads                         4
Verbosity                       3

Loading NCBI taxonomy
Loading nodes file ... Done, got 13938 nodes
Loading merged file ... Done, added 0 merged nodes.
Loading names file ... Done
Init RMQ ...Done
[=================================================================] 100.00% 1 eta -
Time for merging to tax_per_contig: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 24ms
createtsv sacc_tax_results/tmpDir/12008456911185907304/preds sacc_tax_results/tmpDir/12008456911185907304/tax_per_pred sacc_tax_results/taxRes_tax_per_pred.tsv 

MMseqs Version:                     1da320a9daa75dce5539442b5674f69951a2fe4f
First sequence as representative    false
Target column                       1
Add full header                     false
Sequence source                     0
Database output                     false
Threads                             4
Compressed                          0
Verbosity                           3

Time for merging to taxRes_tax_per_pred.tsv: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 2ms
createtsv sacc_tax_results/tempFolder/latest/contigs sacc_tax_results/tmpDir/12008456911185907304/tax_per_contig sacc_tax_results/taxRes_tax_per_contig.tsv 

MMseqs Version:                     1da320a9daa75dce5539442b5674f69951a2fe4f
First sequence as representative    false
Target column                       1
Add full header                     false
Sequence source                     0
Database output                     false
Threads                             4
Compressed                          0
Verbosity                           3

Time for merging to taxRes_tax_per_contig.tsv: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 1ms
+ grep Saccharomyces sacc_tax_results/taxRes_tax_per_contig.tsv
NC_001133.9 4932    species Saccharomyces cerevisiae    19  19  17  0.890   -_cellular organisms;d_Eukaryota;-_Opisthokonta;k_Fungi;-_Dikarya;p_Ascomycota;-_saccharomyceta;-_Saccharomycotina;c_Saccharomycetes;o_Saccharomycetales;f_Saccharomycetaceae;g_Saccharomyces;s_Saccharomyces cerevisia

Thank you for taking a look at the problem

milot-mirdita commented 2 years ago

Please post the full log output of the run.

YFLEEL commented 2 years ago

Hello, the complete log output is as follows: res.txt

milot-mirdita commented 2 years ago

From your output it seems like to completed successfully. I am not sure what the issue is?

YFLEEL commented 2 years ago

Hi,When running other test sets, ALL OKAY will be output at the end of the run, as shown below, but this will not be output at the end of running sacc_tax, I thought it was a sudden interruption of the run. So according to what you mean, this dataset also runs successfully, right?

+ perl compare_fasta_results.pl no_overlap_results/predRedOverAllowed.fas no_overlap_results/predRedNoOver.fas no_overlap/as_should_final_grouped_predictions_rep.fas no_overlap/as_should_final_grouped_predictions_rep_no_overlap.fas
ALL OKAY!
milot-mirdita commented 2 years ago

The last test doesn't seem to write ALL OKAY:

(grep "Saccharomyces" "${RESULTPATH}/taxRes_tax_per_contig.tsv") || (echo "did not match Saccharomyces"; exit 1)

So its looking all right.

YFLEEL commented 2 years ago

yeah~Thank you very much for your patience