Open SergejN opened 3 years ago
You can run the mapper as:
winnowmap -W repetitive_k15.txt -ax map-ont ref.fa ont1.fq.gz ont2.fq.gz ont3.fq.gz ...
Will this resolve your issue?
BTW, you can also tweak the size of chunk that is processed at a time (assuming you can tolerate more memory-usage) using -I
parameter.
You can run the mapper as:
winnowmap -W repetitive_k15.txt -ax map-ont ref.fa ont1.fq.gz ont2.fq.gz ont3.fq.gz ...
Will this resolve your issue?
In theory, yes, but it's also super inconvenient to specify the names of 137 files on the command line.
BTW, you can also tweak the size of chunk that is processed at a time (assuming you can tolerate more memory-usage) using
-I
parameter.
Yes, I saw this parameter, but I had the impression that minimap2
cannot process sequences longer than 4G. I now saw that this was incorrect and only applies to a single sequence within the dataset and not the total length of the sequences. I will give it a try and set -I
to the whole genome size (32Gb). Thanks!
You might be able to do
winnowmap -W repetitive_k15.txt -ax map-ont ref.fa <(ls -1 *.fq.gz|tr '\n' ' ')
Not tested
*assumes all FASTQ files are desired and have the extension .fq.gz
Yes, sure. This will also work, unless you have to specify so many files that the command line becomes too long (2MB on my system, so quite a few file names):
winnowmap -W repetitive_k15.txt -ax map-ont ref.fa $(find . -name "*.fq.gz" | grep -v 'whatever_you_want_to_exclude' | 'tr '\n' ' ')
But I wanted to propose a more elegant way. Of course, I can also put the file names into a text file and then run (assuming there are no spaces or other weird characters)
winnowmap -W repetitive_k15.txt -ax map-ont ref.fa $(cat filelist | tr '\n' ' ')
Dear maintainers,
is it possible to add a possibility to specify a list of input files instead of a single file? I work with the axolotl genome and have quite a few long reads. Therefore, I have two possibilities
zcat
the input files into a single huge fastq file, which is a bit wasteful given the amount of data ORzcat
the input files and pipe the data towinnowmap
.However, since the genome is to huge,
minimap2
has to split the index. Therefore, if I pipe the data,winnowmap
ends up mapping the reads only to the first 5 scaffolds, which are included in the first index chunk. Other scaffolds are processed as well afterwards, but there are no more data in the pipe. It would be nice to be able to specify multiple input files, which all can be read multiple times if necessary.I also tried creating the index first by setting
-d scaffolds.mmi
, and then runningwinnowmap
, but in this case I get a segmentation fault.thanks!