lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
476 stars 131 forks source link

specify list of scaffolds for wgsCoveragePlotter #202

Open PAMorin opened 2 years ago

PAMorin commented 2 years ago

Verify

Subject of the issue

specify list of scaffolds for wgsCoveragePlotter

Your environment

Steps to reproduce

I want to plot coverage of only a specific subset of scaffolds, but all scaffolds are named sequentially (e.g., scaffold_1, scaffold_2, etc.). Is there a way to substitute a list variable or array for the --include-contig-regex regular expression:

e.g., chr=("scaffold_1" "scaffold_2" "scaffold_3" "scaffold_4" "scaffold_5" "scaffold_10") java -jar ${covplot}/dist/wgscoverageplotter.jar --dimension 1500x500 -C -1 --clip -R ${REFDIR}/${REF} ${BAMDIR}/${BAMFILE} --include-contig-regex ${chr} --percentile median > ${OUTDIR}/${BAMFILE}_covplot_allChrom.svg

Expected behaviour

WGS coverage plot with only the specified scaffolds

Actual behaviour

WGS coverage plot of only the first scaffold in the list $chr

lindenb commented 2 years ago

@PAMorin I'm sorry I think I don't really understand the problem.

How about using a regular expression like

(scaffold_1|scaffold_2|scaffold_3|scaffold_4|scaffold_5|scaffold_10)

?

PAMorin commented 2 years ago

Thanks for the quick response!

It looks like you understood my problem (not understanding how to construct a multiple object expression) perfectly. This works:

chr="scaffold_4|scaffold_5|scaffold_10"

java -jar ${covplot}/dist/wgscoverageplotter.jar --dimension 1500x500 -C -1 --clip -R ${REFDIR}/${REF} ${BAMDIR}/${BAMFILE} --include-contig-regex ${chr} --percentile median  > ${OUTDIR}/${BAMFILE}_covplot_allChrom.svg

The output is attached. My genome assembly has a few hundred scaffolds, but the first 22 are chromosome-length, and I only want to plot coverage of those scaffolds. This approach allows me to specify which ones to plot, and I can further refine it to omit or include a few specific scaffolds of interest in a single plot.

Thanks,

Phil

On 6/17/22 3:17 PM, Pierre Lindenbaum wrote:

@PAMorin https://github.com/PAMorin I'm sorry I think I don't really understand the problem.

How about using a regular expression like

|(scaffold_1|scaffold_2|scaffold_3|scaffold_4|scaffold_5|scaffold_10)|

?

— Reply to this email directly, view it on GitHub https://github.com/lindenb/jvarkit/issues/202#issuecomment-1159274212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAAERIDAIQMRZ34REYLI4TVPT2RBANCNFSM5ZDQSTXA. You are receiving this because you were mentioned.Message ID: @.***>

--

Phillip A. Morin, Ph.D. (he/him/his) Southwest Fisheries Science Center 8901 La Jolla Shores Dr. La Jolla, CA 92037, USA Phone: 858-546-7165 @.*** http://swfsc.noaa.gov/mmtd-mmgenetics

"I have no special talent, I am only passionately curious." Albert Einstein "Care about what other people think and you will always be their prisoner." Lao Tzu "Your value doesn't decrease based on someone's inability to see your worth." Unknown