mfarragher / obsidiantools

Obsidian tools - a Python package for analysing an Obsidian.md vault
Other
402 stars 28 forks source link

Unable to filter index using Windows filepath with include_subdirs=[] #7

Closed virtualarchitectures closed 2 years ago

virtualarchitectures commented 2 years ago

Hi,

I can successfully view my vault file index in Windows. If I then try to filter the list by subdirectory I can successfully list notes in the root and in the 'docs' folders. If I filter by the name of a lower subdirectory using a Windows filepath the returned list is empty.

For example, my file index includes the following list items:

{'README': WindowsPath('README.md'),
 'index': WindowsPath('docs/index.md'),
 'Quotations': WindowsPath('docs/Quotations.md'),
 'Creative Commons': WindowsPath('docs/Concepts/Creative Commons.md'),
 'Crowdsourcing': WindowsPath('docs/Concepts/Crowdsourcing.md'),
 'Data Format': WindowsPath('docs/Concepts/Data Format.md'),
 'Data Model': WindowsPath('docs/Concepts/Data Model.md'),
 'Data Sovereignty': WindowsPath('docs/Concepts/Data Sovereignty.md'),
}

Based on the obsidiantools-demo I would expect to be able to list all the markdown files in the 'Concepts' folder using the following call:

(otools.Vault(vault_dir, include_subdirs=['docs/Concepts'], include_root=False) .file_index)

Instead the returned object is empty {}.

Reversing the slash to create a linux path resolves the issue:

(otools.Vault(vault_dir, include_subdirs=['docs\Concepts'], include_root=False)
 .file_index)

Returns:

{'Creative Commons': WindowsPath('docs/Concepts/Creative Commons.md'),
 'Crowdsourcing': WindowsPath('docs/Concepts/Crowdsourcing.md'),
 'Data Format': WindowsPath('docs/Concepts/Data Format.md'),
 'Data Model': WindowsPath('docs/Concepts/Data Model.md'),
 'Data Sovereignty': WindowsPath('docs/Concepts/Data Sovereignty.md')}

Ideally this would be resolved by the obsidiantools package rather than the user. Alternatively suggest updating the documentation.

mfarragher commented 2 years ago

@virtualarchitectures the back end does it based on strings so yeah the slash direction is not great for cross-platform.

I've pushed a change in dev branch. I hope that works for Windows - it casts the paths as posix when doing comparison to try to get their strings to use a forward slash. (I'm expecting ['folder1/'] would also work now for filtering on 'folder1', whereas the '/' at the end would be problem in original implementation)

Install: pip install git+https://github.com/mfarragher/obsidiantools.git@dev

virtualarchitectures commented 2 years ago

@mfarragher Just checked but the forward slashes are still returning an empty object.

mfarragher commented 2 years ago

Hmm, that is strange - I do not have the problem on Windows. 🤔

Testing on my largest vault, I get this output as expected, whichever slash I use:

{'Ipsum': WindowsPath('RS/lecture2/Ipsum.md'),
 'Lorem': WindowsPath('RS/lecture2/Lorem.md')}

for

The new code gives undesirable behaviour though, as 'RS' would not include 'RS/lecture2'. I have a way of fixing that but it will require Python 3.9+. Not ready to do that so I will probably roll functions back to the original.

mfarragher commented 2 years ago

.as_posix() returns a string and moves all the slashes in Windows paths to use / so that should be part of the solution.

What happens when you adapt these snippets for your vault?:

Forward slash

[p.as_posix() for p in vault.file_index.values()
 if p.as_posix().startswith('RS/lecture2')]

returns me ['RS/lecture2/Ipsum.md', 'RS/lecture2/Lorem.md']

Back slash

[p.as_posix() for p in vault.file_index.values()
 if p.as_posix().startswith('RS\lecture2')]

returns me []


Those results are what I'd expect.

so the return states I'm thinking of for get_md_relpaths_matching_subdirs (compare to base branch) would be something like:

[i for i in get_md_relpaths_from_dir(dir_path)
 if i.parent.as_posix() in include_subdirs]
virtualarchitectures commented 2 years ago

Hmm, that is strange - I do not have the problem on Windows. 🤔

Testing on my largest vault, I get this output as expected, whichever slash I use:

{'Ipsum': WindowsPath('RS/lecture2/Ipsum.md'),
 'Lorem': WindowsPath('RS/lecture2/Lorem.md')}

for

  • (otools.Vault(VAULT_DIR, include_subdirs=['RS\lecture2'], include_root=False).file_index)
  • (otools.Vault(VAULT_DIR, include_subdirs=['RS/lecture2'], include_root=False).file_index)

The new code gives undesirable behaviour though, as 'RS' would not include 'RS/lecture2'. I have a way of fixing that but it will require Python 3.9+. Not ready to do that so I will probably roll functions back to the original.

Hi @mfarragher I retested this and realised that I'd assumed re-installing with pip install git+https://github.com/mfarragher/obsidiantools.git@dev would update the code. False assumption on my part. After uninstalling and reinstalling the path worked for me both ways: 'docs/Concepts' and 'docs\Concepts'.

image

Testing the further snippets you provided against my own code I get the following on my Windows machine:

Windows Path

image

Linux Path

image

I think that agrees with your findings.

mfarragher commented 2 years ago

Fix is in dev branch (back in January). Will try to get this in v0.7.1 (or even v0.8) on PyPI in August.