Closed sophieball closed 4 years ago
Can I do that without writing the comments to disk?
Can I do that without writing the comments to disk?
I removed dumping the model. I checked, I'm not writing the prompt type summary to disk.
But do you still have the output from main/train_prompt_types
? There are some comments. You can totally remove them. But I'm just wondering if we can see the arcs (those things like don't* -> you_)? Those may help label prompt types (although they don't seem to be super helpful in our task - but we might still be able to answer why they are not as helpful as they are in predict conversation gone awry)
I still have that output, so we can inspect those arcs.
But this line is problematic:
comments_10K = pd.read_csv("src/data/random_sample_10000_prs_body_comments.csv")
Because we shouldn't store the comments in a CSV, but would rather read them from standard in.
I still have that output, so we can inspect those arcs.
But this line is problematic:
comments_10K = pd.read_csv("src/data/random_sample_10000_prs_body_comments.csv")
Because we shouldn't store the comments in a CSV, but would rather read them from standard in.
OH!! right... lemme see... how to pass in 2 df from R to py? (That's why I was doing this prediction in 2 steps...)
You probably can't pass in two, but I could combine the tow. The format of the two are pretty similar, right? Wherever they're not similar, could just fill out nulls.
@CaptainEmerson I added the conversation you gave me as one of the tests. I hope it works now..
Before you merge, do you mean to check in all these files? main/pt_model_10K.files/* are all output files, right?Otherwise, the results in our shared directory.
I was thinking if after we run it the first time using 10K directly from server, we save those pt_models, then we won't need to run the 10K code again and again in the future. But now I think I can remove them because running 10K doesn't take too long; they might change the API in the future again; the dump files may contain comments.
I'll remove them before I merge.
oh sorry forgot to clarify - this should be your sampled 10K CLs.