Closed laukikpatil closed 10 months ago
@laukikpatil
I imagine this issue is not just on Databricks, as there could be bugs with fields_csv
and only_fields
for certain inputs.
Do you think you could provide as much of the following as possible: (I understand some data is private, so may not be possible)
fields_csv
you supplied where you get a failurefields_csv
file that gets produced if you just run the command without the fields_csv
or only_fields
It will be very difficult to diagnose without these.
There are definitely cases where removing some fields in fields_csv
and using only_fields
mean that the table structure created makes no sense any more (i.e when removing intermediate tables in the schema). These will be unavoidable, but I am more concerned with the error message not explaining that this is the case.
The above error says that the receiver thread died without really reporting why, which is not good. It looks like you are supplying some kind of python iterator e.g a list or a generator. If possible, you could also try supplying a file instead (even as just an experiment) as these are likely to produce better error messages.
closing as hard to diagnose without any further information.
The library is able to parse out JSON documents with no issues on databricks. However, I get an error when I try to use the fields_csv and only_fields parameters. I am getting the below error.
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a1667dce-2743-49e2-8f1e-82ca5c50677f/lib/python3.10/site-packages/flatterer/init.py:153, in flatten(input, output_dir, csv, xlsx, sqlite, parquet, dataframe, path, main_table_name, emit_obj, ndjson, json_stream, force, fields_csv, only_fields, tables_csv, only_tables, inline_one_to_one, schema, id_prefix, table_prefix, path_separator, schema_titles, sqlite_path, preview, threads, files, log_error, postgres, postgres_schema, drop, pushdown, sql_scripts, evolve, no_link, stats, low_disk, gzip_input, json_path, arrays_new_table) 150 if s3: 151 raise AttributeError("s3 output not available when supplying an iterator") --> 153 iterator_flatten_rs(bytes_generator(input), output_dir, csv, xlsx, sqlite, parquet, 154 main_table_name, tables_csv, only_tables, fields_csv, only_fields, 155 inline_one_to_one, path_separator, preview, 156 table_prefix, id_prefix, emit_obj, force,
157 schema, schema_titles, sqlite_path, threads, log_error, 158 postgres, postgres_schema, drop, pushdown, sql_scripts, evolve, 159 no_link, stats, low_disk, gzip_input, json_path, arrays_new_table) 160 else: 161 raise AttributeError("input needs to be a string or a generator of strings, dicts or bytes")
RuntimeError: sending on a disconnected channel