Open Ox0400 opened 1 year ago
Hi. Curios, how do you plan on using the new methods ? Personally, I only use Rust part of the project, and do not use the Python bindings. I implemented them to learn Python/Rust integration.
My intention for Python version to use the python wrappers simhash.py
and minhash.py
, where you can provide your own tokenizer. Are you using gaoya.gaoya.simhash.SimHash128StringIntIndex
directly ?
Hi. Curios, how do you plan on using the new methods ? Personally, I only use Rust part of the project, and do not use the Python bindings. I implemented them to learn Python/Rust integration.你好。好奇心,您打算如何使用新方法?就我个人而言,我只使用项目的 Rust 部分,而不使用 Python 绑定。我实现它们是为了学习 Python/Rust 集成。 My intention for Python version to use the python wrappers
simhash.py
andminhash.py
, where you can provide your own tokenizer. Are you usinggaoya.gaoya.simhash.SimHash128StringIntIndex
directly ?我希望 Python 版本使用 python 包装器simhash.py
和minhash.py
,您可以在其中提供自己的分词器。您直接使用gaoya.gaoya.simhash.SimHash128StringIntIndex
吗?
IM
Hi. Curios, how do you plan on using the new methods ? Personally, I only use Rust part of the project, and do not use the Python bindings. I implemented them to learn Python/Rust integration. My intention for Python version to use the python wrappers
simhash.py
andminhash.py
, where you can provide your own tokenizer. Are you usinggaoya.gaoya.simhash.SimHash128StringIntIndex
directly ?
Hi, yes, I was using SimHash64StringIntIndex
, like this.
from gaoya.simhash import SimHashStringIndex
class SimHashTool(SimHashStringIndex):
def size(self) -> int:
return self.index.size()
def iter(self) -> List[Tuple[int, int]]:
# [(100, 879782272769711604), (101, 879782272769711604)]
return self.index.iter()
def par_bulk_tokens2signatures(self, tokens_list: List[List[str]]) -> List[int]:
return self.index.par_bulk_tokens2signatures(tokens_list)
def par_bulk_insert_sig_pairs(self, id_sig_pairs: List[Tuple[int, int]]) -> int:
self.index.par_bulk_insert_sig_pairs(id_sig_pairs)
return self.size()
def query_tokens_return_distance(self, tokens: List[str]) -> List[Tuple[int,int]]:
return self.index.query_tokens_return_distance(tokens)
def insert_tokens(self, doc_id: int, tokens: List[str]) -> None:
self.index.insert_tokens(doc_id, tokens)
def par_bulk_insert_tokens_pairs(self, id_tokens_pairs: List[Tuple[int, List[str]]]) -> int:
self.index.par_bulk_insert_tokens_pairs(id_tokens_pairs)
return self.size()
# more functions ....
Thanks for the pull request @Ox0400 . I will take a look during the weekend.