o19s / skipchunk

Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr
MIT License
18 stars 2 forks source link

Paths on Windows may be getting mangled #18

Open kealist opened 2 years ago

kealist commented 2 years ago

Description

I started working through the solr example (I have it saved with the name skipchunker.py, but it is the solr example) on Windows but I'm getting some fun path issues. it looks like it's truncating the path name to skipchun and generally doesn't support windows paths.


FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\kealist\\code\\git\\scraped-data\\src\\.venv\\lib\\site-packages\\skipchun\\skipchunk/solr_home/configsets/skipchunk-graph-configset'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:/Users/kealist/code/git/scraped-data/src/skipchunker.py", line 64, in <module>
    gq.index(s)
  File "C:\Users\kealist\code\git\scraped-data\src\.venv\lib\site-packages\skipchunk\graphquery.py", line 99, in index
    ok1 = self.engine.index(predicatedocs,timeout=timeout)
  File "C:\Users\kealist\code\git\scraped-data\src\.venv\lib\site-packages\skipchunk\solr.py", line 182, in index
    isCore = self.indexCreate()
  File "C:\Users\kealist\code\git\scraped-data\src\.venv\lib\site-packages\skipchunk\solr.py", line 153, in indexCreate
    raise ValueError(message)
ValueError: DISK ERROR! Could not find the schema at C:\Users\kealist\code\git\scraped-data\src\.venv\lib\site-packages\skipchun\skipchunk/solr_home/configsets/skipchunk-graph-configset

An additional issue, lxml v 4.5.2 is not compatible with python 3.10 on windows--it will install on 3.8 though, so if it's possible to update that dependency, it would be awesome

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
kealist commented 2 years ago

https://github.com/o19s/skipchunk/blob/master/skipchunk/utilities.py#L16 is the offending line

module_dir = os.path.dirname(os.path.abspath(__file__))
print(module_dir)
source_dir = module_dir[0:module_dir.rfind('/')]
print(source_dir)

#FIX

source_dir = os.path.join(os.path.dirname(module_dir), '')
print(source_dir)

results in:


c:\Users\kealist\code\git\scraped-data\src\
c:\Users\kealist\code\git\scraped-data\sr
c:\Users\kealist\code\git\scraped-data\
kealist commented 2 years ago

Wanted to touch base to see if there is any way to get a release with the fix in it as I would really like to try to use this lib.

kealist commented 2 years ago

Any way to get a new package with this?