Closed georgerichardson closed 2 years ago
I haven't run any sanity checks on the results besides checking shape and that order is retained. Will do a manual check of the results before we merge. Small but not insignificant chance another model might work noticeably better
There is now a quality assurance flow flow_qa.py
which produces a chart and number that show the percent of descriptions that are not truncated by the maximum input length of the sentence encoder. It also produces a random sample of 100 company descriptions that are matched to their nearest neighbour by cosine distance according to the embeddings produced.
closes #18
Checklist:
notebooks/
flake8
and addressed any linter erorspre-commit
and addressed any issues not automatically fixeddev
(or merged any new changes fromdev
)README
soutput/reports/