urgi-anagen / TE_finder

A suite of C++ programs developed for transposable element search and their annotation in large eukaryotic genome sequence. A part of the REPET package.
https://urgi.versailles.inra.fr/Tools/REPET
Other
18 stars 3 forks source link

DUSTER error hashing query sequence...****** unknown exception catch !!! ****** #3

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

I got some errors when using DUSTER, here is the last serval lines in the log file

hashing query sequence...ok 61299 hits found --> Time spent: 0.03 seconds search fragments...ok 3970 fragments found --> Time spent: 0.04 seconds merge fragments...ok 53056 ranges found --> Time spent: 0 seconds ==>chunk #1001/1230:99990000..100089989 ---direct strand--- hashing query sequence...****** unknown exception catch !!! ******

If you want to reproduce the error, you can download the data from the following link

https://drive.google.com/file/d/1v9QYhMi8MU3jEwT1eyjyQ03vuodcfn96/view?usp=sharing

Best, Kun

hquesneville commented 2 years ago

Dear Kun,

Try to remove the AGGGTTT repeats at the end of the chromosome. I have not yet tested this, but my guess is that it comes perhaps from that region.

Best,

Hadi

Hadi Quesneville Directeur de recherche / Research director Administrateur des Données Algorithmes et Codes de la recherche (ADAC) / Chief Data Officer

@.**@.> Unité de Recherches en Génomique-Info (UR INRAE 1164), INRAE, Centre de recherche de Versailles, bat.18 RD10, Route de Saint Cyr 78026 Versailles Cedex, FRANCE

Tél: +33 1 30 83 30 08 mob: +33 6 45 46 78 86 Fax: +33 1 30 83 38 99 http://urgi.versailles.inrae.fr Twitter: @hquesneville LinkedIn : hquesneville

@.***

Le 7 juil. 2022 à 10:15, xiekunwhy @.**@.>> a écrit :

Hi,

I got some errors when using DUSTER, here is the last serval lines in the log file

hashing query sequence...ok 61299 hits found --> Time spent: 0.03 seconds search fragments...ok 3970 fragments found --> Time spent: 0.04 seconds merge fragments...ok 53056 ranges found --> Time spent: 0 seconds ==>chunk #1001/1230:99990000..100089989 ---direct strand--- hashing query sequence...** unknown exception catch !!! **

If you want to reproduce the error, you can download the data from the following link

https://drive.google.com/file/d/1v9QYhMi8MU3jEwT1eyjyQ03vuodcfn96/view?usp=sharing

Best, Kun

— Reply to this email directly, view it on GitHubhttps://github.com/urgi-anagen/TE_finder/issues/3, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD23G2IMMXWZMIAJUZE3UQLVS2G3NANCNFSM524OQD5Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>

xiekunwhy commented 2 years ago

Dear Hadi,

I use mdust(https://github.com/lh3/mdust) to hard-mask this chromosome, but got the same error, and the error region (==>chunk #1001/1230:99990000..100089989) is not at the end (total length of this chromosome is 123036671bp), so I don't think AGGGTTT repeats cause this problem.

By the way, in the hifi era, long tandem repeats may become more oftenly.

Best, Kun

hquesneville commented 2 years ago

Dear Kun,

Maybe the chromosome is too long. Try to split it in chunks.

I apologize for these quick answers without taking the time to investigate more deeply, but I am too busy for the moment and cannot provide more help.

However, thank you for your feedback on this bug.

Best regards,

Hadi

Hadi Quesneville Directeur de recherche / Research director Administrateur des Données Algorithmes et Codes de la recherche (ADAC) / Chief Data Officer

@.**@.> Unité de Recherches en Génomique-Info (UR INRAE 1164), INRAE, Centre de recherche de Versailles, bat.18 RD10, Route de Saint Cyr 78026 Versailles Cedex, FRANCE

Tél: +33 1 30 83 30 08 mob: +33 6 45 46 78 86 Fax: +33 1 30 83 38 99 http://urgi.versailles.inrae.fr Twitter: @hquesneville LinkedIn : hquesneville

@.***

Le 7 juil. 2022 à 18:28, xiekunwhy @.**@.>> a écrit :

Dear Hadi,

I use mdust(https://github.com/lh3/mdust) to hard-mask this chromosome, but got the same error, end the error region (==>chunk #1001/1230:99990000..100089989) is not at the end (total length of this chromosome is 123036671bp), so I don't think AGGGTTT repeats cause this problem.

By the way, in the hifi era, long tandem repeats may become more oftenly.

Best, Kun

— Reply to this email directly, view it on GitHubhttps://github.com/urgi-anagen/TE_finder/issues/3#issuecomment-1177893595, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD23G2NT2NCZI57PZVCWWPDVS4ATBANCNFSM524OQD5Q. You are receiving this because you were mentioned.Message ID: @.***>

xiekunwhy commented 2 years ago

Yes, the chomosome is too long, and duster finished normally when I split it in 50M per chunk.

Best regards, Kun