lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Biostar78400.jar , Error: "FlowCell id HS2000-450_507 defined twice in XML" #58

Closed muhammadsohailraza closed 8 years ago

muhammadsohailraza commented 8 years ago

Subject of the issue

I have already compiled successfully Biostar78400.jar. and i am trying to run this on my linux clusters. But it runs with an error message "FlowCell id HS2000-450_507 defined twice in XML"

Your environment

I run following commands: java -Xmx10g -jar /dist/biostar78400.jar \ -o $OUTPUT/sohail-modify.sam \ -x $INPUT/input.xml \ $INPUT/sohail1.sam

Expected behavior

It should produce a new BAM file with lane-specific RG tags.

Description of Error and Actual behavior

After running the Biostar78400.jar, It prompts an error message: "FlowCell id HS2000-450_507 defined twice in XML"

My BAM file reads names look like this: HS2000-450_507:4:2115:1889:70619 HS2000-450_507:3:2311:13151:38215 HS2000-450_507:2:2315:18670:41735

you can see the flowcells are the same.but different lanes. Could you please guide me how can i correctly assign RG tags in such scenario?

And one question practically is it possible to run sequencing on multiple lanes with same flowcell?

Thanks!

lindenb commented 8 years ago

example: one flowcell, two lanes

<read-groups>
<flowcell name="HS2000-450_507">
 <lane index="7">
   <group ID="X1">
     <library>L1</library>
     <platform>P1</platform>
     <sample>S1</sample>
     <platformunit>PU1</platformunit>
     <center>C1</center>
     <description>blabla</description>
   </group>
 </lane>
 <lane index="8">
   <group ID="x2">
     <library>L2</library>
     <platform>P2</platform>
     <sample>S2</sample>
     <platformunit>PU1</platformunit>
     <center>C1</center>
     <description>blabla</description>
   </group>
 </lane>
</flowcell>
</read-groups>
muhammadsohailraza commented 8 years ago

Hi , Thanks for the solution! I am getting another error message, [main] ERROR jvarkit - Read name HS2000-450_507:4:1108:17482:45667 doesn't match regular expression ([a-zA-Z0-9]+):([0-9]):[0-9]+:[0-9]+:[0-9]+.". please check option -p

I am new in this could you please help me to correctly assign a regular expression for:

HS2000-450_507:4:2115:1889:70619 HS2000-450_507:3:2311:13151:38215 HS2000-450_507:2:2315:18670:41735

I am getting an error message while running the script: "Does not match Regular expression, please check -p"

The read names looks the same as the example given ( https://github.com/lindenb/jvarkit/wiki/Biostar78400) but still don't know why not work with default parameters..

Thank you!

lindenb commented 8 years ago

change the first group from [a-zA-Z0-9]+ to [a-zA-Z0-9_\-]+ (== add an underscore and hyphen)

lindenb commented 8 years ago

closing at it seems resolved on biostars