yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
65 stars 5 forks source link

median and quartile for length and accuracy #30

Closed nicolo-tellini closed 4 months ago

nicolo-tellini commented 4 months ago

Hi,

In some cases real datasets can be strongly skewed, could be add the option to use length/accuracy-median and length/accuracy-q1/q3 instead of mean and sd ? do you think this implementation is feasible and could be an improvement ?

best

nic

yukiteruono commented 4 months ago

Thank you for using PBSIM3. Your suggestion is possible to implement, but it seems not easy. If I were to simulate strongly skewed data, I would generate several data sets with different parameters and combine them. If you have real data you want to mimic, and it is not PacBio HiFi, I would recommend using sample-based simulation (see README). This will allow you to mimic the length and accuracy distribution of the real data.

nicolo-tellini commented 4 months ago

Ok! thanks for the suggestion