Open tmari opened 4 years ago
You are correct – the current version of ProteomeGenerator is not ALT aware but this would be a cool feature to add and we would be happy to test a pull request.
On Nov 15, 2019, at 9:55 AM, tmari notifications@github.com<mailto:notifications@github.com> wrote:
I was trying proteomegenerator on my data, but the analysis was giving me an error at the level of "UCSC_GTF". Looking at the code in pgm
shell: "cat {GTF} | grep chr > {output.reference}; \ cat {input} | grep chr > {output.merged} 2> {log}"
but GTF files don't necessarily have "chr" prefix in the chromosome name field (source: https://www.ensembl.org/info/website/upload/gff.html<x-msg://64/url>).
Is the purpose of USCS_GTF to just remove the first lines starting with an hash from the files, or also to remove also scaffolds from the GTFs and keep only chromosomal positions? Both are easy fixes but quite different.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jtpoirier_proteomegenerator_issues_7-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DABSNON6EZ5BSXA7PO53RKFTQT2Z57A5CNFSM4JN34J72YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HZUG3FQ&d=DwMCaQ&c=j5oPpO0eBH1iio48DtsedeElZfc04rx3ExJHeIIZuCs&r=mDZyQneLgKnHONN2b0V0N3HlSeLNCA4tZf6bwo45-1w&m=8Dmvtb9VFg1m1khh62hmh8-4o6AeAs_DCG3FbTOfrU4&s=3sQh3BEp6iicpM_GboO5nDw9nxbSzT5Yur9sHWtVjGo&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABSNON75Z3IBVC4WDGN766TQT2Z57ANCNFSM4JN34J7Q&d=DwMCaQ&c=j5oPpO0eBH1iio48DtsedeElZfc04rx3ExJHeIIZuCs&r=mDZyQneLgKnHONN2b0V0N3HlSeLNCA4tZf6bwo45-1w&m=8Dmvtb9VFg1m1khh62hmh8-4o6AeAs_DCG3FbTOfrU4&s=zueKIwRgIzNOL7UAPzQG5W2bidCG78odl6862mKrJRo&e=.
I was trying proteomegenerator on my data, but the analysis was giving me an error at the level of "UCSC_GTF". Looking at the code in pgm
shell: "cat {GTF} | grep chr > {output.reference}; \
cat {input} | grep chr > {output.merged} 2> {log}"
but GTF files don't necessarily have "chr" prefix in the chromosome name field (source: https://www.ensembl.org/info/website/upload/gff.html).
Is the purpose of USCS_GTF to just remove the first lines starting with an hash from the files, or also to remove scaffolds from the GTFs and keep only chromosomal positions? Both are easy fixes but quite different.