s312569 / clj-biosequence

A Clojure library designed to make the manipulation of biological sequence data easier.
77 stars 11 forks source link

Regression in 0.4.3 failed parsing #32

Closed averagehat closed 8 years ago

averagehat commented 8 years ago

in 0.4.3

(#clj_biosequence.core.fastaSequence{:acc matches131090Translation, :description nil, 
:alphabet :iupacNucleicAcids, :sequence XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX})

in 0.4.1

(#clj_biosequence.core.fastaSequence{:acc matches131090Translation, :description nil, 
:alphabet :iupacNucleicAcids, :sequence [G C G G G C A G C T A T C T G C T G G A A G A A C T G T T T G A A G G C C A T C T G G A A A A A G A A T G C T G G G A A G A A A T T T G C G T G T A T G A A G A A G C G C G C G A A G T G T T T G A A G A T G A T G A A A C C A C C G A T G A A T T T T G G C G C A C C T A T A T G G G C G G C A G C C C G T G C G C G A G C C A G C C G T G C C T G A A C A A C G G C A G C T G C C A G G A T A G C A T T C G C G G C T A T G C G T G C A C C T G C G C G C C G G G C T A T G A A G G C C C G A A C T G C G C G T T T G C G G A A A G C G A A T G C C A T C C G C T G C G C C T G G A T G G C T G C C A G C A T T T T T G C T A T C C G G G C C C G G A A A G C T A T A C C T G C A G C T G C G C G C G C G G C C A T A A A C T G G G C C A G G A T C G C C G C A G C T G C C T G C C G C A T G A T C G C T G C G C G T G C G G C A C C C T G G G C C C G G A A T G C T G C C A G C G C C C G C A G G G C A G C C A G C A G A A C C T G C T G C C G T T T C C G T G G C A G G T G A A A C T G A C C A A C A G C G A A G G C A A A G A T T T T T G C G G C G G C G T G C T G A T T C A G G A T A A C T T T G T G C T G A C C A C C G C G A C C T G C A G C C T G C T G T A T G C G A A C A T T A G C G T G A A A A C C C G C A G C C A T T T T C G C C T G C A T G T G C G C G G C G T G C A T G T G C A T A C C C G C T T T G A A G C G G A T A C C G G C C A T A A C G A T G T G G C G C T G C T G G A T C T G G C G C G C C C G G T G C G C T G C C C G G A T G C G G G C C G C C C G G T G T G C A C C G C G G A T G C G G A T T T T G C G G A T A G C G T G C T G C T G C C G C A G C C G G G C G T G C T G G G C G G C T G G A C C C T G C G C G G C C G C G A A A T G G T G C C G C T G C G C C T G C G C G T G A C C C A T G T G G A A C C G G C G G A A T G C G G C C G C G C G C T G A A C G C G A C C G T G A C C A C C C G C A C C A G C T G C G A A C G C G G C G C G G C G G C G G G C G C G G C G C G C T G G G T G G C G G G C G G C G C G G T G G T G C G C G A A C A T C G C G G C G C G T G G T T T C T G A C C G G C C T G C T G G G C G C G G C G C C G C C G G A A G G C C C G G G C C C G C T G C T G C T G A T T A A A G T G C C G C G C T A T G C G C T G T G G C T G C G C C A G G T G A C C C A G C A G C C G A G C C G C G C G A G C C C G C G C G G C G A T C G C G G C C A G G G C C G C G A T G G C G A A C C G G T G C C G G G C G A T C G C G G C G G C C G C T G G G C G C C G A C C G C G C T G C C G C C G G G C C C G C T G G T G]})

code:

(defn load-fasta [p trans] (if trans (init-fasta-file p :iupacNucleicAcids) (init-fasta-file p :iupacAminoAcids)))
(defn fasta-search  [path conn opts ]
   {:pre [(is-path? path)]}
  "side effect: opens file fasta"
  (let [translate (:is_dna opts)
        fasta (load-fasta path translate)]
  (with-open [r (bs-reader fasta)]
                   (->> (biosequence-seq r) 
                        ((if translate
                            #(mapcat six-frame-translation %)
                            identity))
                            (println))))
averagehat commented 8 years ago

That codes is a little mis-leading. The problem seems to be that now lower-case sequences don't work.

The above code will work with :uncheckedDNA, but then the translation will fail.

Proposed solution:

Arguably we could use an extended alphabet that includes lower case letters. Any opinion?

s312569 commented 8 years ago

Sorry was travelling for work and no time!

I thought lower case may have been the problem and have more comments on your pull request.