Maybe this exists and I can't find it by searching the manual and the CLI reference, but I would like a way to make the first row of TSV outputs the column names. This would make exploration using tools such as Pandas or Visidata a lot easier.
Current Behavior
The first line of the TSV file is the first row of the data.
Context
Right now, if I want to use a tool such as Pandas to analyze mmseqs output, I have to manually pass in the header columns. Worse, when I went to share the data with a collaborator, I had to tell him the columns separately. This is a brittle approach, both for data reuse and archiving.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 13.45111
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Conda
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
@Benjamin-Lee yes, I agree this would be great but there is currently no way in MMseqs2 to add header lines.
We already discussed this and we might add the feature in future releases.
Expected Behavior
Maybe this exists and I can't find it by searching the manual and the CLI reference, but I would like a way to make the first row of TSV outputs the column names. This would make exploration using tools such as Pandas or Visidata a lot easier.
Current Behavior
The first line of the TSV file is the first row of the data.
Context
Right now, if I want to use a tool such as Pandas to analyze mmseqs output, I have to manually pass in the header columns. Worse, when I went to share the data with a collaborator, I had to tell him the columns separately. This is a brittle approach, both for data reuse and archiving.
Your Environment
Include as many relevant details about the environment you experienced the bug in.