tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

Speed up by filtering the comparison positions at SampComp step #146

Closed mmiladi closed 4 years ago

mmiladi commented 4 years ago

Hi,

I was wondering if it is possible to run SampComp on a specific region of the transcript(s).
This would speed up the computation once a specific region of the transcript is only desired or if the user already knows part of the transcript has a poor coverage in one of the two conditions (with the warning message Skipping N positions because not present in all samples with sufficient coverage).

A pre-filtering solution based on the collapsed eventalign files would be also fine for my case. But I was not sure how one can do it without distorting the consistency between NanopolishComp Eventalign_collapse index and collapsed events files.

Thanks, -M

tleonardi commented 4 years ago

Hi, the per-position filtering by coverage is already implemented as part of the whitelisting functions. You have more flexibility if you run Nanocompore from python and call Whitelist explicitly. For example:

from nanocompore.SampComp import SampComp    
from nanocompore.Whitelist import Whitelist
fn_dict = { 
  "Cond1": {
     "Cond1_1": "/path/to/out_eventalign_collapse.tsv", 
     "Cond1_2":"/path/to/out_eventalign_collapse.tsv"
   },
  "Cond2":{
     "Cond2_1": "/path/to/out_eventalign_collapse.tsv", 
     "Cond2_2":"/path/to/out_eventalign_collapse.tsv"
   },
}
fasta="/path/to/fasta_file"
outdir="/path/to/out_dir"

wl = Whitelist(eventalign_fn_dict=fn_dict, fasta_fn=fasta, <options>)

s=SampComp(eventalign_fn_dict=fn_dict, fasta_fn=fasta, outpath=outdir, whitelist=wl, <options>)

db=s()

You can find here the documentation for the options supported by Whitelist. However, at the moment there is not way to explicitly provide a list of positions of interest.