Closed ssnn-airr closed 2 years ago
Hi! I need more details. Are your pRESTO output files using the pRESTO annotation scheme? Are you using Changeo-O’s MakeDb.py imgt
to parse IMGT output? (There is an example here and the documentation is here).
Reopen if needed
Original report by notrando (Bitbucket: notrando, ).
Hi there,
Thanks for creating presto, great tool and amazing ecosystem.
I’m trying to analyse the output of presto using the IMGT servers, but unfortunately they truncate the read names, therefore information is lost which affects downstream analysis.
There are a few options provided by presto which partially solve the issue. There’s the
ParseHeaders.py
subcommandsadd
andrename
.add
will append to the end of the read, so unfortunately this doesn’t help andrename
will add to the start with some minor issues (like adding NONE| for some odd reason) but both of these do not really solve the issue: a short unique identify that can be added to the start of the read name.I think a simple solution is adding the record number to the start of each read. For example 100 reads would have
SAMPLE_1
SAMPLE_2
…SAMPLE_100
added to the start of the read name. The most optimal solution would be a new subcommand that renames the headers to the sample record and then creates a text file with new and old names for renaming back or referencing.On a slightly related note, it would be fantastic if
add
subcommand could add to the start of the read name.Thanks!