sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

Bug with reading barcode sharing file with variable numbers of columns per row #253

Closed jeremymsimon closed 3 years ago

jeremymsimon commented 3 years ago

My barcode sharing file, because of how the experiment was designed, has different numbers of columns per row:

#17-24
ACTCGTAA    CTGCTTTG    AAACGATA    CATGATCA    TTACCTCG    GGGTAGCG    GCCTGCAA    CCGAGAAA    TGGTATAC    ACGGACTC    CGTTCGAG    ACTTACGA    TCTATTAC    TATTTAAG
ATAAGCTC    ACCGTACG    ATTCATGG    TATAGTCG    ATCCGCGA    TGGGCATC
ATCGCATA    TACCTAGA    CCGTTCTA    GCTGCATG    TGGCGCGC    GTCATATG    TGTCTGAA    ATATTGGC
CTGTCCCG    CTAAGGGA    AATTTCTC    TCGTTTCG    CGCGACTA    GAATAATG    TGAAGCAA    ACTGCGCA
TTATTCTG    GCTTATAG    GTTCAACA    ATCATGCA    ACGCCGGC    ACGTTAAC    TTGTCTTA    CCATCTTG
TACGGTTA    CATAGCTA    TTGGGAGA    GAGGTTGA    TGCTTGGG    GCACTGAC    TAAATATC    TTCATCGC
CACAATTG    GAAATTAG    GTGCTAGC    AGGATTAA    CGCCCGGA    AATAGAAC    GCTCGCGG    TCTTAATC
CTTTGGTC    TAATACGC    TTCCGATC    GTTTGTGA    TTCGCTAC    CGAACGTC    AGCGAAAC    GGTTCTTC
AAATAGCA    GCAAATTC    CGTCTAGG    GCTATGCG    GCCGTGTA    CTACCCTA    CGCTTAAA    GTGGGTTC
GACCTTTC    GTCCGTAG    GGTGGAGC    TGCGATCG    TACTCGAA    TATCCGGG    CATTTGGA    AGGTAATA
GAGCACAA    CGTGGTTG    GTCGCGCG    GACAAAGC    GTTACGTA    GGGCGATG    CTATTTCA    ATCTATAA    ACTATATA    GCCCATGA    TCACTTTA    CTGAAAGG

zUMIs throws a warning from data.table but proceeds anyway:

In data.table::fread(opt$barcodes$barcode_sharing, header = F, skip = 1) :
  Discarded single-line footer: <<GAGCACAA      CGTGGTTG        GTCGCGCG        GACAAAGC        GTTACGTA        GGGCGATG        CTATTTCA        ATCTATAA        ACTATATA        GCCCATGA        TCACTTTA
        CTGAAAGG>>

I'm not 100% sure but it seems like if you add the parameter fill=TRUE to that call, this problem may be fixed, but you may want to test and make sure downstream steps aren't somehow negatively affected by that change

cziegenhain commented 3 years ago

Hey,

Thanks for letting me know - didn't really anticipate this scenario! As you susggested fill=TRUE can help here if we then escape the resulting empty fields. I just pushed the update to github and should be ok to use!

Best, Christoph

jeremymsimon commented 3 years ago

Thanks! I am running it now