Closed Hoeze closed 7 months ago
Hi @Hoeze! I just ran a quick test locally, and the CSV piper should still work for TSVs. However, the CSV datasource exposes many options, including whether there is a header (out_header
) or there are comments (out_comment
). Can you tell me more about what the command you're running?
@karenfeng The issue that I had was that the option should be called outDelimiter
instead of out_delimiter
.
However, I'm having another issue now.
For some reason, the options outNullValue
and outEmptyValue
do not work:
import json
import shlex
vep_transformed_df = glow.transform(
"pipe",
input_df.limit(10).distinct(),
# cmd=json.dumps(shlex.split("cat | grep -v '^##'")),
cmd=json.dumps(shlex.split(vep_cmd)),
inputFormatter='vcf',
inVcfHeader='infer',
outputFormatter='csv',
# outQuote="##",
outHeader=True,
outDelimiter="\t",
outNullValue="-",
outEmptyValue="-",
)
# vep_transformed_df.toPandas()["cDNA_position"].iloc[0]
'-'
Is there again some difference in naming?
Closing since we now support only the text piper and to/from csv functions in Spark
I'm trying to run:
but I still only get a single column. How can I read tsv output?