molbiodiv / bcdatabaser

A pipeline to create reference databases for arbitrary markers and taxonomic groups from NCBI data
https://bcdatabaser.molecular.eco
MIT License
6 stars 3 forks source link

Add filter and orient with HMMs #12

Open iimog opened 6 years ago

iimog commented 6 years ago

Currently only dispr is included for filtering and orientation. However HMMs would make sense as well. Especially, as we have good HMMs for the borders of the ITS2.

iimog commented 6 years ago

Possible problem. We want to search DNA so in HMMER >=3.1 there are special tools for that nhmmer and nhmmscan. Those have the following purpose (see manual):

But what we would need is the DNA equivalent to:

However this does not exist. First experiments indicate that both nhmmer and nhmmscan could work for our purpose but there might be problems with either large sequence databases or long query sequences.

iimog commented 6 years ago

Important: hmmsearch only searches the forward strand (as it would with proteins). nhmmer also considers the reverse strand. So either switch over to that or run hmmsearch on both the normal sequences and the rc of all sequences.