Tried to add information about columns from huggingface, but I couldn't find anything super relevant. So I decided to just add the dataset description instead. Made the appropriate changes in the base prompt as well.
Updated the prompt for column selection by (1) adding an example where there is a mismatch between the task and dataset chosen, so there are no relevant columns (eg using a machine translation dataset for a summarization task), and (2) swapped an old example for another one
Fixed bug in truncate_row -- only add the "..." if len(row) > max_length
Added support for datasets like opus100 (which have nested columns), so that these datasets can be used and we don't try to generate a dataset -- the solution was pretty straightforward, using flatten() function of huggingface + some trivial column preprocessing
Description