jbkerry / oligo

Pipelines for Capture-C oligo design
https://oligo.readthedocs.io
GNU General Public License v3.0
4 stars 0 forks source link

Species "mouse" is not known to RepeatMasker #10

Closed fbleao closed 1 year ago

fbleao commented 1 year ago

Dear,

I'm trying to use the Capture function in Oligo to design probes against the mouse genome mm10. I'm using the following code:

python design.py Capture -f GRCm38.primary_assembly.genome.fa -g mm10 -b targets.bed -e DpnII --blat

I'm getting the following result:

_Loading reference fasta file... ...complete Generating oligos... ...complete. Wrote oligos to oligo_seqs.fa Checking for repeat sequences in oligos, with RepeatMasker... ...complete. Output written to oligo_seqs.fa.out Aligning oligos to the genome, with BLAT... ...complete. Output written to blat_out.psl Traceback (most recent call last): File "/Users/felipe/Desktop/Tools/oligo-0.1.1b/design.py", line 544, in c.extract_repeats().calculate_density().write_oligo_info() ^^^^^^^^^^^^^^^^^^^ File "/Users/felipe/Desktop/Tools/oligo-0.1.1b/tools.py", line 146, in extract_repeats with open('.'.join((self.fasta, 'out'))) as repeats_file: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'oligoseqs.fa.out'

When checking the rm_log.txt, it is clear that the issue arrives from RepeatMask not finding Species "Mouse", as shown below:

RepeatMasker version 4.1.5 Search Engine: HMMER [ 3.2.1 (June 2018) ] Using Master RepeatMasker Database: /Users/felipe/Desktop/Tools/RepeatMasker/Libraries/RepeatMaskerLib.h5 Title : Dfam withRBRM Version : 3.7 Date : 2023-01-11 Families : 64,595 Species "mouse" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script.

If I change the command from mm10 for hg19, the program works properly. Do you have any idea on how to fix the issue? Is there an specific version of RepeatMasker that must be used?

Thank you

jbkerry commented 1 year ago

Thank you, I will start looking into this issue that is mostly likely due to a change in how RepeatMasker is now run for a later version. I will see if I can update the code to accomodate the new version, or give clear instructions about RepeatMasker version compatibility

jbkerry commented 1 year ago

The original oligo software was written using RepeatMasker v4.1.0 but a more recent version has updated the allowed taxonomy names for species such that the common name of "mouse" is no longer accepted. The genbank common name of "house mouse", the scientific name of "Mus musculus" or the taxonomy id of 10090 can be used instead. Since scientific name seems to be accepted for both the latest versions of RepeatMasker and the originally-tested v4.1.0, I will push a code change to update to use scientific names for both human and mouse. In the meantime, a workaround has been discussed with OP. I will keep this issue open until the code change has been fully incorporated.

jbkerry commented 1 year ago

a code fix for this has been merged into master and bundled into a new release: 0.1.2