rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
225 stars 49 forks source link

LandScape Script #67

Open ohan-Bioinfo opened 4 years ago

ohan-Bioinfo commented 4 years ago

I tried to generate landscape using the script after divergent calculation, it returns empty HTML include some of the classes not all, what if I wanted specific classes would this be possible? for example class 1 and 2

please advice :

calcDivergenceFromAlign.pl -s example.divsum Genome.fa.align ./createRepeatLandscape.pl -div example.divsum -g 2,034,013,661 > e.html Error:

Parsing example.divsum
ARTEFACT will not be graphed!
LINE/I-Jockey will not be graphed!
Unspecified will not be graphed!
DNA/hAT-hAT19 will not be graphed!
DNA/Crypton-A will not be graphed!
LTR/Unknown will not be graphed!
LTR/ERV2 will not be graphed!
LTR/ERV3 will not be graphed!
LTR/Undefined will not be graphed!
jebrosen commented 4 years ago

Your -g parameter looks wrong, since it doesn't accept commas: the genome will be interpreted as being 2 base pairs long, instead of 2 Gbp. This might change the generated graph a lot, so please try changing this first.

I tried to generate landscape using the script after divergent calculation, it returns empty HTML include some of the classes not all, what if I wanted specific classes would this be possible? for example class 1 and 2

This will need some modifications of createRepeatLandscape.pl, and reducing to just class 1 vs 2 would be removing most of the information the graph shows! We might be able to make some simple modifications for this, though.

And finally, it looks like createRepeatLandscape.pl needs to be fixed to recognize some newer class names like DNA/Crypton-A and LINE/I-Jockey.

ohan-Bioinfo commented 4 years ago

I made some changes in g flag and removed the comma, the HTML generated the pie chart and the distributions, but still showing the same error.

./createRepeatLandscape.pl -div example.divsum -g 2034013661 > BacteriabGraph.html
Parsing example.divsum
DNA/Crypton-A will not be graphed!
Unspecified will not be graphed!
LTR/ERV3 will not be graphed!
ARTEFACT will not be graphed!
LTR/Unknown will not be graphed!
LTR/Undefined will not be graphed!
DNA/ will not be graphed!
LINE/I-Jockey will not be graphed!
DNA/hAT-hAT19 will not be graphed!
LTR/ERV2 will not be graphed!
jebrosen commented 4 years ago

Some of these look like they should be fixed in createRepeatLandscape.pl: DNA/Crypton-A, LINE/I-Jockey, and DNA/hAT-hAT19. I am working on testing a fix.

LTR/ERV2 and LTR/ERV3 are not RepeatMasker subtypes - where are those names coming from? RepeatMasker's libraries call ERV2 by the name ERVK, and ERV3 as ERVL. It would be best to fix these in the library, but they could be added to createRepeatLandscape.pl. The rest are unknown/undefined classifications, but I am also not familiar with these names - most of them look to me like they should be Unknown instead.

Either way, if these not-graphed elements are in low enough amounts in the genome, it might not make much of a difference that they are missing from the graph. Does the divsum file indicate high enough amounts of these elements that they would be noticeable on the graph?

elcortegano commented 2 years ago

Just want to comment here in case it helps somebody with the same issue.

I had this exact error, but the cause was that I was running RepeatMasker with a custom repeat library (using -lib file.fa), and the FASTA had not defined a repeat class/family in createRepeatLandscape.pl. Adding a class names to the FASTA and manually introducing these in createRepeatLandscape.pl worked for me.