hassonlab / 247-pickling

Contains code to create pickles from raw/processed data
1 stars 9 forks source link

New pickling #43

Closed VeritasJoker closed 2 years ago

VeritasJoker commented 2 years ago

Changes in this pr:

  1. Moved the bad convo filtering to 247-encoding and removed it from 247-pickling

    • tfsemb_concat.py (added a condition for blenderbot)
    • tfspkl_build_matrices.py (removed the bad convo condition and filtering)
  2. New encoder (will not feed in the words afterward)

    • tfsemb_main.py
  3. Added models and tokenizers used inside the download file

    • tfsemb_download.py
  4. Typo fixes

    • tfsemb_main (fixed a typo that caused an error for gpt2-xl embeddings for 676)
hvgazula commented 2 years ago

Done. Thanks for the work.