shendurelab / MIPGEN

One stop MIP design and analysis
Other
22 stars 19 forks source link

mipgen_pipeline_builder input files #36

Closed sinhaeti closed 6 years ago

sinhaeti commented 6 years ago

Hi, I received the mipgen sequencing data from another person in my group, and don't understand some of the required files for the mipgen_pipeline_builder.py.

I have pair-ended fastq reads (2 files). I want to map the insert without the other parts of the sequence. The sequences contain: forward primer sequence - 4 bp barcode - forward arm - insert - reverse arm - 4 bp barcode - complementary reverse primer sequence

I have what I believe is the mipgen output file (---.picked_mips.xlxs). Also, the length of the ligation and extension arms vary from 18-24 bp.

The pipeline builder asks the following: what is the path to the index read? [blank if demultiplexed] what is the path to the file of index sequences you would like to select? [blank not to select] what is the path to the mip design file? (validity is checked)

What is the index read? how is different than the the file of index sequences? Is the mip design file the one with the extension .pickedmips ?

sinhaeti commented 6 years ago

I also get this error:

what is the path to the mip design file? (validity is checked) ---.picked_mips.xlsx Traceback (most recent call last): File "mipgen_pipeline_builder.py", line 192, in m = re.search("(N)CTTCAGCTTCCCGATATCCGACGGTAGTGT(N)",testmip_fields[seq_index]) IndexError: list index out of range

augustboyle commented 6 years ago

Hello,

The pipeline builder expects a text file format, so you should save the picked mips file as .txt (which you can do from Excel). The MIP design file is indeed the picked_mips file.

The index read is the read from the sequencer following read 1 that tells what sample the read pair is from. If you have already demultiplexed the reads (your read 1 and read 2 are all from the same sample) then you do not have any need for index read information and you can leave the line blank. The sample barcode sequences can be specified in the second question, but if you have already separated all the samples then you don’t need to look them up again and you can leave that blank.

Hope that helps!

Evan

On July 24, 2018 at 5:05:04 PM, sinhaeti (notifications@github.com) wrote:

I also get this error:

what is the path to the mip design file? (validity is checked) ---.picked_mips.xlsx Traceback (most recent call last): File "mipgen_pipeline_builder.py", line 192, in m = re.search("(N)CTTCAGCTTCCCGATATCCGACGGTAGTGT(N)",testmip_fields[seq_index]) IndexError: list index out of range

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/shendurelab/MIPGEN/issues/36#issuecomment-407590515, or mute the thread https://github.com/notifications/unsubscribe-auth/AF01p8yiQqi-G9Yxe4qsQh1cZCc1f9JKks5uJ7YvgaJpZM4VfLn2 .