jermp / sshash

A compressed, associative, exact, and weighted dictionary for k-mers.
MIT License
84 stars 17 forks source link

Suggest a suitable value for the parameter m #45

Open jermp opened 5 months ago

jermp commented 5 months ago

Even if set, users should be notified about suitable values for m. As documented in the papers, m should be chosen >= log_4(N)+1, where N is the cumulative length (num. bases) of the SPSS but the software does not currently output this value.

So, if users set m to something below the recommended value, they should be notified.

We can the file and compute N; or we can just stat the file and get the size in bytes, although this works only for uncompressed files.