This PR adds two functions wrap_tabular_dataset() and wrap_tabular_output().
This makes it easier for Dataframe datasets/system outputs in memory easier to use.
The reason for the separation of "dataset" and "output" is because:
users may be uploading to a datalab dataset (which we don't need to upload "dataset")
"dataset" and "output" used different columns (e.g., "true_label" vs "predicted_label")
Example Usage
import explainaboard_client
import pandas as pd
client = explainaboard_client.ExplainaboardClient()
data = pd.DataFrame(
[['snli-dataset', 'This church choir sings to the masses as they sing joyous songs from the book at a church.', 'The church has cracks in the ceiling.', 'neutral'],
['snli-dataset', 'This church choir sings to the masses as they sing joyous songs from the book at a church.', 'The church is filled with song.', 'entailment'],
['snli-dataset', 'This church choir sings to the masses as they sing joyous songs from the book at a church.', 'A choir singing at a baseball game.', 'contradiction'],
['snli-dataset', 'A woman with a green headscarf, blue shirt and a very big grin.', 'The woman is young.', 'neutral'],
['snli-dataset', 'A woman with a green headscarf, blue shirt and a very big grin.', 'The woman is very happy.', 'entailment'],
['snli-dataset', 'A woman with a green headscarf, blue shirt and a very big grin.', 'The woman has been shot.', 'contradiction']],
columns=['dataset', 'text1', 'text2', 'true_label'])
predict_data = pd.DataFrame(
[['neutral'],
['neutral'],
['neutral'],
['neutral'],
['neutral'],
['neutral']],
columns=['predicted_label'])
wrapped_dataset = client.wrap_tabular_dataset(
data,
task="text-pair-classification",
columns_to_analyze=['text1', 'text2', 'true_label']
)
wrapped_output = client.wrap_tabular_output(
predict_data,
task="text-pair-classification"
)
evaluation_result = client.evaluate_system(
task='text-pair-classification',
system_name='text-pair-classification-test',
system_output=wrapped_output,
custom_dataset=wrapped_dataset,
split='test',
source_language='en',
system_details={}
)
Next Steps
add documentation with example on how to use these functions
Closes #56
This PR adds two functions
wrap_tabular_dataset()
andwrap_tabular_output()
. This makes it easier forDataframe
datasets/system outputs in memory easier to use.The reason for the separation of "dataset" and "output" is because:
Example Usage
Next Steps