pachterlab / splitcode

Flexible and efficient parsing, interpreting and editing of sequencing reads
https://pachterlab.github.io/splitcode/
BSD 2-Clause "Simplified" License
35 stars 0 forks source link

A bug regarding the summary JSON output #15

Open JohnMMa opened 1 week ago

JohnMMa commented 1 week ago

I noticed if the input FASTQ set does not include all the tags listed in the config file, then the the tag_qc object in summary JSON file (i.e. the one generated by -s) will not terminate properly, causing issues for downstream operations.

The attached files are the FASTQ inputs, summary JSON, and config file for the following:

splitcode -c tags.txt --x-only -C 1 -N 3 -t 2 --summary /home/data_datastore/Analysis/[...]/new_local/s8_summary.json exp000705_sample_8_S8_L001_I1_001.fastq.gz exp000705_sample_8_S8_L001_R1_001.fastq.gz exp000705_sample_8_S8_L001_R2_001.fastq.gz
* Using a list of 341 tags (vector size: 341; map size: 12,832; num elements in map: 12,881)
* will process sample 1: sample_8_S8_L001_I1_001.fastq.gz
                         sample_8_S8_L001_R1_001.fastq.gz
                         sample_8_S8_L001_R2_001.fastq.gz
* processing the reads ...
done
* processed 105 reads

When attempting to read s8_summary.json using standard Python protocol in python 3.10:

 with open("s8_summary.json", 'rt') as fp:
...     json.load(fp)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/jma/miniconda3/envs/base/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/home/jma/miniconda3/envs/base/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/jma/miniconda3/envs/base/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/jma/miniconda3/envs/base/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 235 column 2 (char 10629)

Comparing this file with the summary JSONs that work properly, I noticed the following in the final lines:

        {"tag": "bead3-92", "distance": 0, "count": 1},
    ]
}

whereas the last 3 lines of a summary file that work normally look like this:

        {"tag": "bead3-95", "distance": 0, "count": 14048}
    ]
}

Removing the comma at the end of the third last line of s8_summary.json causes restores compatibility with Python. I wonder if that's more of a Python issue, or a bug on splitcode's summary output code?

sample_8_S8_L001_I1_001.fastq.gz sample_8_S8_L001_R1_001.fastq.gz sample_8_S8_L001_R2_001.fastq.gz s8_summary.json tags.txt

Yenaled commented 1 week ago

Thanks for this -- this is indeed a minor bug; the JSON specification unfortunately prohibits those trailing commas.

I'm tagging this as a bug so I remember to fix this in the next release.