Closed detrout closed 1 year ago
I rebased my parse-wt commit onto your current head so it should work.
Oh it occurred to me should this be in the SPLiT-seq directory as a different version? I wasn't completely sure if SPLiT-seq was the v1 protocol.
Hi Diane, could you take a look at this PR and modify the parse spec and files accordingly? The changes mostly have to do with selecting one million sequencing reads and uploading them
https://github.com/IGVF/seqspec/pull/12
thank you!
Ok I tried to update the pull request with the example fastq where you asked.
Though I do wonder if instead of just commiting the fastq.gz files if we should be switching to git-lfs for these large files?
Hi Diane, could you please run seqspec check
and verify that the spec has no errors?
Sorry I forgot for a while.
I just ran it against 1daef17dae0bcc99178d49b52096b88a9d49b8c4
$python3 -m seqspec.seqspec_check specs/parse-wt-v2/wt-mega-v2.yaml
$
Can you run the spec against the head of the IGVF seqspec main branch?
Oops. I was calling the validator wrong.
I rebased against c52b53002a81fbcfcd41b03c3c8cf35218a69741 and made some fixes.
Though what's the difference between N and X for the sequence string?
Oops. I was calling the validator wrong.
I rebased against c52b53002a81fbcfcd41b03c3c8cf35218a69741 and made some fixes.
Though what's the difference between N and X for the sequence string?
Looks great! I merged- could you add the R1.fastq.gz when you get a chance?
Hi both,
Thanks @detrout for providing the specs and whitelist.
I've been trying to figure out the details of ParseBio recently, and just realised that there might be some problems with the current spec for ParseBio.
As far as I know, the ParseBio structure should be:
[10-bp UMI][8-bp Round3 barcode]GTGGCCGATGTTTCGCATCGGCGTACGACT[8-bp Round2 barcode]ATCCACGTGCTTGAGACTGTGG[8-bp Round1 barcode](dT)
See this thread. The above structure seem to be correct based on real data on GEO. In general, the current spec is okay, but the first onlist should be barcode-23_onlist.txt
, the second onlist should be barcode-23_onlist.txt
as well, the third onlist should be barcode-1_onlist_v2.txt
. In the current version, the order of the onlist is reversed.
Then, I also have a problem with the sequence in barcode-1_onlist_v2.txt
. The sequences here do not match real data, for example, SRR13948565
.
@detrout : May I ask where the onlist sequences come from? Does ParseBio provide it? The kit is not available in China, so I don't know where to get it.
Finally, sorry if this is not the best place to discuss. We could open a new discussion if needed.
Xi
Hello,
I would be interested in figuring out the linker sequence for the first barcoding step as I'd like to add gene specific primers in the RT step. Did anyone figure out/confirm if the sequences were correct?
Thanks a lot!
This is my stab at a seqspec for the nextseq & novaseq reads we have built using the Parse biosystem WT-mega v2 kit.
As far as I can tell it validates correctly.