Open zhangrengang opened 1 year ago
The genome fasta is available here: https://ftp.cngb.org/pub/CNSA/data3/CNP0001649/CNS0136789/CNA0036300/P.tabuliformis_V1.0.fa.gz
When I break the chromosomes into contigs, it works.
Hi,
Can you please check the input file: busco_3011229/genome.fasta directly (not through fai)?
@milot-mirdita does createdb have a problem reading long sequences like this?
I am closing the other issues you opened because they all seem to throw the same error. If needed, we can reopen them.
Hi @elileka , The fasta file appears ok:
$ ll busco_3011229/genome.fasta
-rw-r--r-- 3 zrg wlx 25740764543 May 9 19:05 busco_3011229/genome.fasta
$ grep ">" busco_3011229/genome.fasta | head -n 20
>chr1
>chr10
>chr11
>chr12
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>tig00000026
>tig00000069
>tig00000152
>tig00000188
>tig00000204
>tig00000207
>tig00000251
>tig00000280
$ head busco_3011229/genome.fasta
>chr1
GATATTTAGGATCCCCCTAGTGGGGGATCGGCGGAAACGCCCCCGAAGCTAAAAATAGATGTAAAATTTCCTTGTAAAAT
GTTGTAATTTCGTAGCCAATCTAGGTCGTGCATTAGGGAGAGATCTGACGGTAGAAGTTATTTTTAATTTATGGTTTTTT
CCCCTAGAAGGAAACCACTCGCTATATATGAGGGAATTTTATTGCGTCTATGGATATCTATATTATGAGAAAGAAAGAGA
GAGGAGATTGATCGACAGAGAAGAGGGAATTACAAAGGATCTACTGTAGTTTGTATCTCTTTAGTTTGTTGGATAATATA
AAAGGAAGGACTAGCTGTTTCTTCATGGACGTAGCCCAAATTGGGTGAACCACATATATCTGTGTCTCTCTTGTTTTATG
TGTTTCTATTTCTGCAATATATTTTATGTGTTCCATTGCTCTGTAATATATAATTTTCTAATAACCAATATCAGAGCCGA
AGGTCTATTTGGCTGATAAACTCACAAGAGAGAAGGGTTCCTAGTTCGAGTGGGAGCAATGGCAGAAGATGGTAGGTTTA
GGGTTGAAAATTTAATGGCTAAAACTACGAGTTGTGGAAGATGTAGATGGAAGATTATTTGTACTAGAAATATTTGTACC
AACCATTGAGCAGAAAGGCAAAGAAGTGGATGAGTATGACAGACACAGAATGGGATATTCTTGACAGAAAGGCACTTGGA
$ tail busco_3011229/genome.fasta
CTTTGCTTCTCCTCATACCATGAATGCAAACTTTCATCTGAGCTTTGTGACAAGACTCCCACTGAAAATGATAAAGAAAC
CCTAGATGAAGTTTGTATCAATACTTTTTTCAAGCTCACAGTTAGCTAATGGAGAAGGAATTTGGCTACAGACATCATCC
GACACATCAGCTATTGGATCGTGAAAAACTTGAAAAGACTATTACAACCCATTTTCTATTGGTTCCAATGAGATTGCATG
ATTTTATACAGGCTACTCATTTGAAGATATACTATGCAAAATTGCCTTAAAAAACTGAAATGATTCAAAACATAATGGTA
CAGAATCATCATGCATCATTTGATCATATGTTCCAGCTTCCAAGTCTTCATCTGCAACTTGACATTCAATGGAATTCAGG
GGCTGTAAATCTAGATATGGGAAGTCTTGAACAAGGATTTCATTACCCTCTAGATGGTCAAGATCAAAACTTATCGATGC
TTGATTTATGATTGCATGAAATTTGTATGAACTGAAATAAAGTAAAACATAAAGGCAAAGATCCTTCACTTACTTCAAAA
TTTTCTGCACTTTTCTCCTCACTTCCATAGATGATATGCATCGGCTGATCACTGTATTCGAGCTGCTAAAAGTGAAATTC
CTCCTTCCACAAACCAACTGTTGATTTGTCTGCAAGATTAGCTTTTGTTTGAAGAACATAATCATCATCATACTAATCAA
A
And other programs, such as samtools faidx
and minimap2
, can process it.
I am sorry that I opened so many duplicated issues, which was because of network issue.
Hi, no worries :) Seems like an issue, indeed. We will look into it.
Hi, is the file you sent the same as the file in the example? They have different names...
Does the problem occur with a file that contains only the first chromosome? If so, could you please send this example (that is, a trimmed FASTA file with the sequence of the first chromosome only). It will make it easier for us to debug on a smaller input.
Thank you, Eli
Hi, it is the same file that I just renamed and uncompressed the file. I have tested only chr1 and the same error occur. How can I send the file to you? Give me an email please? I also test only chr10 and it works.
Thank you, I got it :)
Expected Behavior
metaeuk
run normally with other genomes, but crash with a large pine genome (Pinus tabuliformis, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA784915). Do it not support the very long chromosomes:MetaEuk Output (for bugs)