Closed Srividhya-Sainath closed 2 years ago
If you look in the binaries/uncleaned/31 directory, and look in the log file there, is there an error at the end? Could you attach that file here please?
Thank you for the quick response. Unfortunately, I don't have access to the file now. But the message mentioned something in the lines of memory use for Hash table, and if we used a quality filter to reduce the memory footprint.
I have a total of 103 Ecoli strains, and the genome size varies. So to calculate the correct --mem-width and height what would you suggest?
First we need to estimate how many kmers you have in 103 e coli. Due to its open pan genome, its more than just those implied by a 5Mb genome plus Snps. Let's say for now we think 103 genomes if we concatenated all the genes, would be 15Mb long. So let's guess 15 million kmers, and guess 15 million kmers due to sequencing errors.
This means we should choose mem height and width such that 2^mem-height × mem_width is about 15 million.
How much ram will that need? Well, see section 7 of the manual for details. The formula is
8+5C+1 bytes per kmer, where C is the number of samples, here 103.
8+5×103+1=559 bytes per kmer. Multiply by 15 million kmers, makes about 7.9 ×10^9 bytes
did this work out ok @vidhya-sai ?
Hi, Yes this helped and I was able to make it work. Thank you.
Hi,
I am working with a few E.coli strains and wanted to go about reference-free variant calling using cortex. Here is my code
perl run_calls.pl --first_kmer 31 --fastaq_index /group/bioinf_ecoli_kmer/cortex/INDEX --auto_cleaning yes --genome_size 4000000 --bc yes --pd no --outdir ./results/ --outvcf result_trial1 --ploidy 1 --ref Absent --mem_height 18 --mem_width 100 --do_union yes --workflow joint --logfile logfile_trial1.txt --apply_pop_classifier --vcftools_dir /home/bioinf/vidhy/anaconda3/pkgs/vcftools-0.1.16-he513fc3_4/
I get the following error:
Unable to build /group/bioinf_ecoli_kmer/cortex/scripts/calling/results/binaries/uncleaned/31/SRR14272538.unclean.kmer31.ctx at run_calls.pl line 2137
My Index file:
SRR14272538 . /group/bioinf/cortex/raw/SRR14272538_1.fastq /group/bioin/cortex/raw/SRR14272538_2.fastq SRR14272623 . /group/bioinf/cortex/raw/SRR14272623_1.fastq /group/bioinf/cortex/raw/SRR14272623_2.fastq SRR14272622 . /group/bioinf/cortex/raw/SRR14272622_1.fastq /group/bioinf/cortex/raw/SRR14272622_2.fastq
This is relatively new for me. I would be grateful if you could help me with what I am missing here.
Thank you