instructlab / eval

Python library for Evaluation
Apache License 2.0
4 stars 15 forks source link

Ensure that SDG path data branch matches evaluation branch #35

Closed nathan-weinberg closed 1 week ago

nathan-weinberg commented 2 months ago

We need to ensure the SDG path data we are receiving is being generated off the same branch we are passing for evaluation.

I have some sample data that indicates this is tracked in the SDG data in the form of origin_branch_name.

We need to meet with the SDG team and ensure that data will be there and can be consumed in a predictable way.

russellb commented 2 months ago

Right now the data format is the same as it was before in the CLI. That data does not exist in the output right now.

nathan-weinberg commented 2 months ago

Ack @russellb - is the plan to have that information added to the output? This is an example of the sample data I'm basing the above assumptions off of: https://github.com/nathan-weinberg/eval/blob/test/tests/testdata/sdg/tonsil_data.jsonl

cc @danmcp

russellb commented 2 months ago

You should be looking at data generated by the CLI. I don’t think that’s where that data came from.

File an issue against the sdg repo with any requested differences

nathan-weinberg commented 2 months ago

@russellb sounds good, @aakankshaduggal and I are going to talk about this more later today, cc @oindrillac

nathan-weinberg commented 2 months ago

@russellb @oindrillac i just spoke with @aakankshaduggal and the data format y'all are planning to do for training in the issue above is the same format we are expecting for Eval, so we should be good on our side as soon as that's complete!

russellb commented 2 months ago

assuming someone is going to adjust the CLI training code for the new format as well, then?

nathan-weinberg commented 1 month ago

This issue really only pertains to Eval, so can't speak to Training