Closed brunelloandrea closed 1 month ago
Hi, Andrea,
Something doesn't seem right. It seems the input is empty somehow: Total input symbols: 0
Can you see if the gzipped file actually contains something or not?
Ok, thanks. With the Github's SARS file it works, so the problem is that I am probably creating my files in a wrong manner. As far as I understand, I should always build a text file with a single, contiguous, string on a single line, then compress it using gzip (?)
Specifically, in my case, I am considering binary strings, made by a lot of 0s and sparse groups of 1s. The reference string is typically around 1 million characters long, while the other one is typically around 50.000.
I see. The issue should not be in compressing or using a single line (as long as you use the -f
flag meaning the input is FASTA format). If you use binary strings, you may need to play with the -p
and -w
parameters to allow the trigger strings to be set. Default values are -w 10
and -p 100
. In your case I would probably try -w 5
and -p 30
or something on this line.
Alright, thank you. How should I interpret those parameters, intuitively? I have read the original article, but I am not quite sure.
Hi, @brunelloandrea, the w
and p
parameters control the prefix-free parsing step. In particular the w
parameter is the length of a trigger string (i.e., a string that delimits the parsing's phrases) and the p
parameter can be interpreted as the average distance between trigger strings.
In practice trigger strings are identified as all w
-mers of the input text such that their Karp-Rabin hash is congruent 0 modulo p
.
Hello, unfortunately, after installing Moni with the .sh script, I am now facing another issue. When I try to execute the example code:
moni build -r data/SARS-CoV2/SARS-CoV2.1k.fa.gz -o sars-cov2 -f
I get the following error:
I have tried to look for it online, but unfortunately I cannot find anything.