Open ghost opened 3 years ago
Hi @naphthasl ! Thanks for submitting this issue!
I believe the behaviour you are looking for is provided by cwk_path_get_absolute, which I believe generates an absolute path the way you describe. cwk_path_join is currently designed to actually join the two path segments.
Does that help you, or do you need additional functionality?
Is cwk_path_get_absolute
not designed in order to find the absolute path of a relative path? Because that isn't what I'm trying to do. I do not want the output to be an absolute path.
In the vast majority of higher level languages, there is a function for joining two paths by walking through them, as I demonstrated with Python in my initial post.
The problem is that cwk_path_join
does not do what every other library does do, and neither does cwk_path_get_absolute
.
The important takeaway here is that joining together /hello
and /world
into /hello/world
is not a useful operation. There are no situations where this output would be useful, because it is invalid output. Again, see the way os.path.join
works in Python, it's pretty much the standard way of writing a function like this.
cwk_path_get_absolute
does what cwk_path_join
was supposed to do almost perfectly, except it always returns an absolute path, which is not what I need.
I appreciate that my explanation is kind of difficult to understand, but seriously, take a look at os.path.join
. It's the safest (and most sane) way to implement this. Here's a demonstration of what your function SHOULD do, and what both cwk_path_get_absolute
and cwk_path_join
are not capable of doing:
>>> os.path.join("first", "second")
'first/second'
>>> os.path.join("/first", "/second")
'/second'
>>> os.path.join("hello/../world///foo/hello/world", "../world")
'hello/../world///foo/hello/world/../world'
>>> os.path.join("/test/world", "../")
'/test/world/../'
>>> os.path.join("sane/", "../")
'sane/../'
>>> os.path.join("./sane/", "./fast.txt")
'./sane/./fast.txt'
>>>
You will notice that os.path.join does not automatically remove the ../
and ./
. That is not os.path.join's job - it does not exist to simplify your paths, it exists to walk between them. /first
and /second
do not become /first/second
, you walk to /first
and then you walk over to /second
, and arrive at /second
.
So, to summarize - with pretty much every other path library, if you want to walk from one directory to another, you join their paths and then normalize them. You can see how the normalization step works in Python, too:
>>> os.path.normpath('hello/../world///foo/hello/world/../world')
'world/foo/hello/world'
>>> os.path.normpath('/test/world/../')
'/test'
>>>
I'm not really sure where you went wrong here - I think you may have misunderstood the "join" operation as some kind of "normalized concatenation" which is not what a join is supposed to do, because this "normalized concatenation" is not useful for the vast majority of programs, but a "normalized join" or "normalized walk" is extremely useful.
Essentially, you really, really need a function that replicates the output of os.path.join
(or equivalent) in other languages and libraries, because as of right now you do not. Something that replicates os.path.normpath(os.path.join(a, b))
would also be quite nice. Please test your functions against these other libraries, it will improve the quality and usability of this library significantly. In its current state, it will break or be unusable under many conditions/demands, and is also differs significantly from other languages and libraries, making it difficult to use.
Okay, cwk_path_get_relative
seems like it might be closer to what I need. I'll feed it a bunch of tests to see if it works as expected. If it does, then that's great - I would advise you to rename it though, because it isn't immediately obvious that it does the same thing as path joining everywhere else. Maybe you could have a tip in the documentations comparing your functions to other functions elsewhere?
Note: cwk_path_get_relative
still auto-normalizes the path. This is fine for my use-case, but may be problematic for others, as the normalization step ought to be optional as it is with other libraries and their join function(s).
My worry is that people might be using cwk_path_join
in order to do what cwk_path_get_relative
sort of does. If they are, there are many conditions in which their code could break. Since cwk_path_join
isn't really useful for anything outside of the library (in comparison to cwk_path_get_relative), maybe you should make it private or rename it, as it may only confuse people and break their code.
Hi @naphthasl, thanks for your detailed response and explanation. I will definitely give this a thought on how I can improve it and at least be more clear in the documentation. I understand the behaviour might be different to other libraries and this could cause confusion.
Please be careful, cwk_path_get_relative generates a relative path and does not do the same as Python's path join!
You're right, neither cwk_path_get_relative
and cwk_path_get_absolute
work here. Do you have a function whereby testgame/
+ shaders
= testgame/shaders
? (Other than cwk_path_join
, of course)
Also, cwk_path_get_absolute
does not return a path that is absolute. Yes, the path begins with a /
, but it returns /testgame/shaders
, which is also highly problematic. Absolute paths start at the filesystem root. I guess it ought to output /home/naphtha/Code/HAZE/testgame/shaders
- see how it starts at the actual root?
Since we've been using Python equivalents a lot lately:
>>> os.path.abspath(".")
'/home/naphtha/Code/cwalk/build'
cwalk does not actually resolve files on the disk, that's because it is much more portable this way. But cwk_path_get_absolute can actually do more than python's abspath because it accepts a base path. You can simply supply your current working directory as a base path to achieve a similar result though:
#include <stdio.h>
#include <stdlib.h>
#include <cwalk.h>
#ifdef _WIN32
#include <direct.h>
#define getcwd _getcwd
#elif
#include <unistd.h>
#endif
int main(int argc, char *argv[]) {
char buffer[FILENAME_MAX];
char cwd[FILENAME_MAX];
if(!getcwd(cwd, sizeof(cwd))) {
return EXIT_FAILURE;
}
cwk_path_get_absolute(cwd, ".", buffer, sizeof(buffer));
printf("%s\n", buffer); // should output /home/naphtha/Code/HAZE/testgame/shaders depending on your working directory
return EXIT_SUCCESS;
}
Ah, I must've misunderstood how your cwk_path_get_absolute
works. That's actually pretty useful. I think your solution of combining two get_absolutes and a single get_relative with getcwd() will work as a general purpose solution.
Here's an idea, though. Instead of renaming or rewriting anything, maybe there could be an extra function called cwk_fs_path_join
which replicates os.path.join
's functionality? Something like this, based on the code you provided earlier...
#ifndef MAX_CWD
#define MAX_CWD 4096
#endif
size_t cwk_fs_path_join(const char *base, const char *path, char *buffer, size_t buffer_size)
{
char cwd[MAX_CWD];
if(!getcwd(cwd, MAX_CWD)) return 0;
cwk_path_get_absolute(cwd, base, buffer, buffer_size);
cwk_path_get_absolute(buffer, path, buffer, buffer_size);
return cwk_path_get_relative(cwd, buffer, buffer, buffer_size);
}
That way, you can retain the portability of the existing functions while providing easy to use alternatives for people who are looking to replicate functionality similar to other languages.
I had second thoughts about that solution because it will always return a relative path from your working directory. If that works for you that's perfectly fine - however if it is an issue you could also do something more simple like this:
size_t custom_join_paths(const char *path_a, const char *path_b, char *buffer,
size_t buffer_size)
{
if (cwk_path_is_absolute(path_b)) {
return cwk_path_normalize(path_b, buffer, buffer_size);
} else {
return cwk_path_join(path_a, path_b, buffer, buffer_size);
}
}
The output will be normalized though, I hope that won't be an issue for you.
I appreciate your input a lot! I will have to think about what to do relating to the library interface.
Ah - I probably should've said this earlier, but my reasoning behind implicitly using the working directory was because I'd expect a lot of people would feed the output of the function straight into fopen()
, where all paths are relative to the working directory or absolute from the filesystem root. I was thinking about how it would be possible to make this library a bit easier to use for people who need to make a lot of f*
calls (which is a pretty big use case)
The behaviour of cwk_path_join should be similar to running
cd
on the first argument, andcd
on the last argument directly afterwards.We can see this works with simple statements, like joining together
hello
andworld
: Output of cwk_path_join:hello/world
Output of os.path.join from Python:hello/world
Output of twocd
s:hello/world
But, if we try to do something more subtle, like joining together
/hello
and/world
, cwalk fails catastrophically: Output of cwk_path_join:/hello/world
(See https://likle.github.io/cwalk/reference/cwk_path_join.html - This flaw is even demonstrated in the documentation!) Output of os.path.join from Python:/world
Output of twocd
s:/world
So, why is this happening? It seems that cwk_path_join doesn't actually "walk" or "separate cd" as it is intended to, but rather word-for-word smashes the two paths together.
Unless there is an alternative function that does this properly, cwk_path_join should NOT operate like this. Every argument to cwk_path_join should operate like a separate, sequenced cd.
If I run
cd /hello
and thencd /world
, I end up in/world
, because of the/
. I do not end up in/hello/world
. If I used/hello
and then./world
(or justworld
), it would be/hello/world
, but I did not do this.Unreliable behaviour of cwk_path_join makes this library completely unusable for many people. I would suggest testing your function against Python's os.path functions.