yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

PBSIM3 Doesn't Allow References >1 Billion Nucleotides Long #23

Closed jwalewski closed 3 months ago

jwalewski commented 4 months ago

Looking at the source code, it seems to be right here: https://github.com/yukiteruono/pbsim3/blob/c0e07a61e1af707a5150b9da4930c4e684523de3/src/pbsim.cpp#L27

While this may seem like an odd complaint it matters for the simulation of very large genomes (>20Gb).

Is there a reason why 1 billion is the limit? If not, what should the limit be? Would setting it to 5 billion result in other portions of the code crashing/ undefined behavior?

yukiteruono commented 3 months ago

Thank you for using PBSIM often. FASTQ_NUM_MAX and FASTQ_LEN_MAX are set to avoid accidentally creating huge data. You can change them to larger values if the memory capacity of your environment allows.

yukiteruono commented 3 months ago

I made a mistake, your question was REF_SEQ_LEN_MAX. This can also be changed.

jwalewski commented 3 months ago

Hey, thanks so much! Will do!