salbrec / seqQscorer

MIT License
22 stars 9 forks source link

Possible use in miRNA NGS seq #8

Open MilosCRF opened 7 months ago

MilosCRF commented 7 months ago

Hi,

I am interested in using pre-trained models for miRNA NGS sequencing projects. Could you please clarify if the existing models on your repository are suitable for this purpose, or would it be necessary to train new models from scratch?

Thanks, Milos

salbrec commented 3 months ago

Hi Milos,

sorry for the late response!

So, ideally, you have some quality-labeled miRNA samples you can use to train a model specific to this type of data. For example, if you had a set of miRNA samples (maybe 50, better 100) for which you, probably manually, made a categorization into low and good quality, then you could simply create your own model based on this dataset. For this you would have to run the "deriveFeaturesSets" to get the features and the "trainNewModel".

However, most of the features are strongly quality-related, such as the Bowtie mapping, or even solely quality-related, see the FastQC features and those features might also work well for miRNA data. Note that none of the models used in the seqQscorer paper has ever seen miRNA data. That said, it might work well using only the Bowtie/MAP features and the FastQC/RAW features.

For the other feature sets LOC and TSS, you would have to test somehow how well those generalize from the training data to your miRNA data.

I hope this helps. Don't hesitate to ask further questions. I will respond earlier next time, I promise!!! Cheers, Steffen