rnajena / bertax

Taxonomic classification of DNA sequences
GNU General Public License v3.0
50 stars 7 forks source link

BERTax test with Bombus terrestris - Error messages #10

Closed ARW-UBT closed 1 year ago

ARW-UBT commented 1 year ago

Hello, Thanks for BERTax, it really fills a gap in RNA-Seq analyses workflows.

I have tested BERTax with some 'known' taxa in order to get used to it. I selected 100 cDNA sequences from Bombus terrestris which is also part of the BERTax reference genomes (GCA_000214255.1). The output of a test set of 100 sequences, however, did not show B. terrestris as the most likely taxon. Actually, none of the 100 test sequences ended up in genus Bombus. (see attached output below, fasta and tsv files are not allowed as attachments, sorry) Is there an explanation for this unexpected result?

In addition, I receive the following error message. Could this explain the issue described above?

2022-12-06 02:53:57.118535: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. /home/bt140047/miniconda3/envs/bertax/lib/python3.10/site-packages/keras/initializers/initializers_v2.py:120: UserWarning: The initializer VarianceScaling is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once. warnings.warn(

Best regards

######################### bertax outout ##############

id superkingdom phylum genus
XR_002308984.1 cdna chromosome_group:Bter_1.0:B01:... Eukaryota (100%) Arthropoda (49%) Ooceraea (47%)
XR_002307712.1 cdna chromosome_group:Bter_1.0:B01:... Eukaryota (100%) Arthropoda (97%) Trichoplusia (28%)
XR_002308309.1 cdna chromosome_group:Bter_1.0:B13:... Eukaryota (99%) Arthropoda (59%) Rhopalosiphum (36%)
XR_002308391.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (58%) Mollusca (53%) Crassostrea (50%)
XR_002308163.1 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (67%) Arthropoda (66%) Ooceraea (33%)
XR_002308164.1 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (99%) Arthropoda (59%) Ooceraea (31%)
XR_002308005.1 cdna chromosome_group:Bter_1.0:B09:... Eukaryota (100%) Arthropoda (30%) Caenorhabditis (21%)
XM_012309198.2 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (69%) Arthropoda (44%) Ooceraea (46%)
XM_020863329.1 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (67%) Arthropoda (36%) Ooceraea (26%)
XM_003395929.3 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (98%) Arthropoda (43%) Ooceraea (30%)
XM_003395928.3 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (71%) Arthropoda (35%) Ooceraea (22%)
XM_012315222.2 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (72%) Arthropoda (29%) Nitrososphaera (17%)
XM_012315220.2 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (99%) Apicomplexa (19%) Solanum (19%)
XM_012315218.2 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (99%) Apicomplexa (22%) Phaeodactylum (17%)
XM_012315221.2 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (94%) Streptophyta (24%) Solanum (21%)
XM_012315219.2 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (99%) Apicomplexa (21%) Phaeodactylum (15%)
XM_012308017.2 cdna chromosome_group:Bter_1.0:B04:... Eukaryota (100%) Ascomycota (41%) Drosophila (74%)
XM_020865912.1 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (99%) Arthropoda (100%) Ooceraea (94%)
XM_012317133.2 cdna chromosome_group:Bter_1.0:B15:... Eukaryota (69%) Arthropoda (59%) Ooceraea (45%)
XM_003402617.3 cdna scaffold:Bter_1.0:GL899399:281... Eukaryota (40%) Arthropoda (35%) Ooceraea (29%)
XM_012313924.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (98%) Arthropoda (36%) Ooceraea (22%)
XM_012313925.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (92%) Arthropoda (34%) Ooceraea (26%)
XM_012320838.2 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (98%) Platyhelminthes (41%) Schistosoma (35%)
XM_003394305.3 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (95%) Arthropoda (76%) Ooceraea (62%)
XM_020868488.1 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (99%) Arthropoda (90%) Ooceraea (53%)
XM_012320836.2 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (87%) Arthropoda (75%) Ooceraea (42%)
XM_012320837.2 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (95%) Arthropoda (43%) Ooceraea (40%)
XM_003400287.3 cdna chromosome_group:Bter_1.0:B13:... Eukaryota (98%) Arthropoda (94%) Ooceraea (48%)
XM_003399441.2 cdna chromosome_group:Bter_1.0:B12:... Bacteria (85%) Bacteroidetes (81%) unknown (35%)
XM_003402536.3 cdna scaffold:Bter_1.0:GL899322:191... Eukaryota (100%) Arthropoda (100%) Apis (100%)
XM_012310587.2 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (61%) Arthropoda (30%) Acyrthosiphon (14%)
XM_012310588.2 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (69%) Arthropoda (29%) Ooceraea (10%)
XM_012310590.2 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (95%) Arthropoda (59%) Acyrthosiphon (22%)
XM_003401212.3 cdna chromosome_group:Bter_1.0:B15:... Eukaryota (86%) Arthropoda (46%) Trichoplusia (19%)
XM_020862821.1 cdna chromosome_group:Bter_1.0:B01:... Eukaryota (68%) Arthropoda (37%) Theileria (31%)
XM_003396613.3 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (99%) Arthropoda (34%) Crassostrea (31%)
XM_020865134.1 cdna chromosome_group:Bter_1.0:B10:... Eukaryota (52%) Uroviricota (37%) Acyrthosiphon (10%)
XM_020867249.1 cdna chromosome_group:Bter_1.0:B16:... Eukaryota (88%) Ascomycota (22%) Solanum (26%)
XM_003395306.3 cdna chromosome_group:Bter_1.0:B05:... Eukaryota (61%) Ascomycota (29%) Ooceraea (54%)
XM_012313976.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (98%) Arthropoda (47%) Acyrthosiphon (17%)
XM_012313977.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (99%) Arthropoda (63%) Drosophila (33%)
XM_012313975.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (87%) Arthropoda (43%) Ooceraea (24%)
XM_020868361.1 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (86%) Arthropoda (85%) Ooceraea (62%)
XM_020868360.1 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (79%) Arthropoda (75%) Ooceraea (62%)
XM_012316912.2 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (90%) Arthropoda (81%) Ooceraea (60%)
XM_020868362.1 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (86%) Arthropoda (74%) Ooceraea (56%)
XM_012316908.2 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (87%) Arthropoda (77%) Ooceraea (64%)
XM_003395930.3 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (41%) Mollusca (25%) Crassostrea (31%)
XM_020862692.1 cdna chromosome_group:Bter_1.0:B04:... Eukaryota (96%) Arthropoda (57%) Ooceraea (25%)
XM_012308144.2 cdna chromosome_group:Bter_1.0:B04:... Eukaryota (98%) Arthropoda (64%) Ooceraea (38%)
XM_003395055.3 cdna chromosome_group:Bter_1.0:B04:... Eukaryota (98%) Arthropoda (85%) Tribolium (29%)
XM_012319421.2 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (79%) Apicomplexa (33%) Theileria (22%)
XM_003393756.3 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (90%) Apicomplexa (44%) Theileria (29%)
XM_020868526.1 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (65%) Chlorophyta (23%) Ooceraea (23%)
XM_012313439.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (79%) Arthropoda (53%) Ooceraea (24%)
XM_020865500.1 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (79%) Arthropoda (49%) Ooceraea (24%)
XM_012313440.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (75%) Mollusca (33%) Ooceraea (24%)
XM_012315807.2 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (89%) Arthropoda (87%) Brassica (41%)
XM_012310523.2 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (72%) Arthropoda (24%) Ooceraea (24%)
XM_020863545.1 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (99%) Arthropoda (39%) Ooceraea (22%)
XM_012310525.2 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (99%) Arthropoda (41%) Beta (18%)
XM_003394110.3 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (100%) Arthropoda (75%) Ooceraea (43%)
XM_003399740.3 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (100%) Chordata (73%) Ciona (60%)
XM_012309585.2 cdna chromosome_group:Bter_1.0:B01:... Viruses (89%) Peploviricota (87%) Ooceraea (55%)
XM_003393159.3 cdna chromosome_group:Bter_1.0:B01:... Viruses (81%) Peploviricota (68%) Phaeodactylum (29%)
XM_012321104.2 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (50%) Streptophyta (49%) Solanum (40%)
XM_012321103.2 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (53%) Streptophyta (43%) Solanum (45%)
XM_003394620.3 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (50%) Streptophyta (49%) Solanum (35%)
XM_012313942.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (97%) Chordata (32%) Ooceraea (26%)
XM_012316212.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (99%) Arthropoda (92%) Apis (97%)
XM_012314813.2 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (98%) Arthropoda (50%) Apis (50%)
XM_003399636.3 cdna chromosome_group:Bter_1.0:B12:... Eukaryota (100%) Arthropoda (50%) Apis (50%)
XM_020866083.1 cdna chromosome_group:Bter_1.0:B12:... Viruses (57%) Pisuviricota (54%) Ooceraea (17%)
XM_012316045.2 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (68%) Peploviricota (26%) Olea (14%)
XM_003393799.3 cdna chromosome_group:Bter_1.0:B02:... Eukaryota (71%) Firmicutes (43%) Acyrthosiphon (16%)
XM_003401455.3 cdna chromosome_group:Bter_1.0:B15:... Eukaryota (100%) Arthropoda (76%) Ooceraea (54%)
XM_012308865.2 cdna chromosome_group:Bter_1.0:B05:... Eukaryota (64%) Arthropoda (46%) Apis (24%)
XM_020866407.1 cdna chromosome_group:Bter_1.0:B14:... Viruses (84%) Pisuviricota (38%) Ooceraea (26%)
XM_020866406.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (73%) Apicomplexa (54%) Theileria (15%)
XM_020866405.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (71%) Apicomplexa (54%) Ooceraea (18%)
XM_020866410.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (78%) Apicomplexa (46%) Acyrthosiphon (19%)
XM_020866409.1 cdna chromosome_group:Bter_1.0:B14:... Viruses (80%) Apicomplexa (31%) Ooceraea (34%)
XM_020866408.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (83%) Arthropoda (41%) Ooceraea (38%)
XM_020866412.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (87%) Apicomplexa (46%) Ooceraea (34%)
XM_020866411.1 cdna chromosome_group:Bter_1.0:B14:... Eukaryota (65%) Apicomplexa (64%) Olea (32%)
XM_003394548.3 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (66%) Arthropoda (79%) Apis (49%)
XM_020862794.1 cdna chromosome_group:Bter_1.0:B04:... Archaea (64%) Peploviricota (26%) Methanobrevibacter (60%)
XM_012311541.2 cdna chromosome_group:Bter_1.0:B09:... Eukaryota (81%) Arthropoda (51%) Apis (36%)
XM_003394183.3 cdna chromosome_group:Bter_1.0:B03:... Eukaryota (100%) Arthropoda (57%) Ooceraea (53%)
XM_003399176.3 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (70%) Arthropoda (62%) Apis (45%)
XM_003399177.3 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (79%) Arthropoda (73%) Apis (48%)
XM_012313573.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (81%) Arthropoda (77%) Apis (49%)
XM_012313574.2 cdna chromosome_group:Bter_1.0:B11:... Eukaryota (64%) Arthropoda (54%) Apis (34%)
XM_012311732.2 cdna chromosome_group:Bter_1.0:B09:... Eukaryota (88%) Arthropoda (51%) Apis (34%)
XM_012309428.2 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (95%) Arthropoda (35%) Ooceraea (49%)
XM_003395969.3 cdna chromosome_group:Bter_1.0:B06:... Eukaryota (99%) Arthropoda (45%) Apis (38%)
XM_012309794.2 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (75%) Arthropoda (88%) Ooceraea (27%)
XM_012309795.1 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (90%) Apicomplexa (49%) Plasmodium (50%)
XM_003396215.3 cdna chromosome_group:Bter_1.0:B07:... Eukaryota (81%) Arthropoda (73%) Ooceraea (30%)
XM_020867575.1 cdna scaffold:Bter_1.0:GL898856:161... Eukaryota (76%) Streptophyta (43%) Ooceraea (34%)

flomock commented 1 year ago

Hey thanks for the interest, the reason why you don't see the genus Bombus is that it was not in the training set (because of too few samples for this taxon), and therefore BERTax is not able to predict it. I will include a full list of compatible phyla and genera in the readme file of this repo. For now you can see here that Bombus is not in the list of potential genera: https://github.com/f-kretschmer/bertax/blob/bc8deca92ae30fd1464bd567444f8db532232fa9/bertax/utils.py#L24 We are aware of this limitation, however if you want to train a model that is able to predict Bombus or other taxa of interest, see https://github.com/f-kretschmer/bertax#training-bertax-models . Further, we expect the prediction quality of phyla to be way superior compared to the genus prediction (if this level of detail is sufficient for you), I recommend reading the published version of the paper, as it clarifies multiple aspects of BERTax better: [HTML] pnas.org

ARW-UBT commented 1 year ago

Hi @flomock Thank you for your answer. I looked in the suppl. info of the preprint paper and found Bombus terrestris in it. That was the reason why I assumed it was included. Anyway, thank you for the PNAS link (I could not see it on bioRXiv).

Do you have any more details to the error message? "Please update your code to provide a seed to the initializer..."

Best regards

flomock commented 1 year ago

Very likely you can ignore both warnings.
The first part is just notifying you that can speed up your predictions if you compile tensorflow on your machine (which is quite a mess and I do not recommend it if you only play around with BERTax). The second warning is new to me, but how i understand it after looking at stackoverflow it only important when you want to train and you need to initialize your Model in a random state.

ARW-UBT commented 1 year ago

Hi @flomock , Thnks for the indo. I installed BERTax it as a conda environment. Best regards