jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.59k stars 1.54k forks source link

adds jv_unshare(), jv_is_unshared() #3109

Open MaxBrandtner opened 2 months ago

MaxBrandtner commented 2 months ago

Still need to add some more tests. I would especially apprecieate some feedback on jv_new(), as this can perhaps be handled more efficiently using jvp

MaxBrandtner commented 2 months ago

Is there a way to make jv_unshare() more efficient using jvp?

MaxBrandtner commented 1 month ago

As it pertains to unit-tests. Especially with jv_paths() and jv_addpath() I would just add sufficiently complex examples to test that it produces the correct output for a use-case that covers the various edge-cases. As it pretains to jv_unshare() I am somewhat unsure how to go about implementing it. I would check for equality and then recursively check that it is not identical (using jv_identical()), but I don't know if jv_identical() would be accurate in the case of partial equality (aka what if the the value of an object is not identical, but because of a hypothetical bug the key were identical)

MaxBrandtner commented 1 month ago

the checks to test whether the various subsections are not identical fail. Does jv_identical() not do what I think it does or what gives?

nicowilliams commented 1 month ago

Can you update the PR's title to mention jv_unshare() instead of jv_new()?

Please update the description to explain what these functions do (or comment in the sources) and the motivation?

I'm quite happy to have jv_unshare(), by the way. I'm surprised it's not called jv_copy_deep() or jv_deepcopy() or something, but I guess jv_unshare() is kind of a better name.

wader commented 1 month ago

I'm quite happy to have jv_unshare(), by the way. I'm surprised it's not called jv_copy_deep() or jv_deepcopy() or something, but I guess jv_unshare() is kind of a better name.

My fault, but i did suggested jv_copy_deep first and jv_unshare as an alternative :) I think jv_copy_deep might be a better and less confusing name. Probably also good to include in the comment that it can be used for making a separate non-shared value suitable for use in another thread.

nicowilliams commented 1 month ago

the checks to test whether the various subsections are not identical fail. Does jv_identical() not do what I think it does or what gives?

jv_identical() checks that if a has a pointer then b has the same pointer. Clearly jv_identical() will never return true(ish) for values that have pointers where one is a jv_unshare() deep copy of the other. Use jv_equal() for that.

nicowilliams commented 1 month ago

I'm quite happy to have jv_unshare(), by the way. I'm surprised it's not called jv_copy_deep() or jv_deepcopy() or something, but I guess jv_unshare() is kind of a better name.

My fault, but i did suggested jv_copy_deep first and jv_unshare as an alternative :) I think jv_copy_deep might be a better and less confusing name. Probably also good to include in the comment that it can be used for making a separate non-shared value suitable for use in another thread.

No, I rather like jv_unshare(), so let's keep it. At any rate we shouldn't bikeshed this too much. jv_new() was not a good name though, so thanks for suggesting a better name.

nicowilliams commented 1 month ago

Thank you, jv_unshared() will be useful. It's very useful in finding leaks in libjq-using programs/libraries. I've seen others resort to jv_parse(jv_string_value(jv_dump_string(...))) to do a deep copy exactly for the purpose of working around valgrind's (and other memory debuggers') inability to deal with reference counting. (I wish valgrind had macros that one could use to tell it that a reference count is being incremented or decremented. That way when there is a leak valgrind could print all the stack traces where a leaked value's reference count was incremented, then one could look around each for a missing jv_free(). Since valgrind has nothing of the sort, a deep copy operation can be substituted for jv_copy() in many cases to get valgrind to help as if it knew how to track reference count increment/decrements.

We'll probably not use jv_unshare() in-tree, except during development, and then it might prove invaluable.

I think jv_addpath() is probably the same as jv_setpath(), and if so -or if you can just use jv_setpath() anyways- let's drop it.

I also think jv_paths() is unnecessary in this PR for testing jv_unshare(), so let's drop jv_paths().

Lastly it'd be great if you could add a int jv_unshared(jv) predicate function that indicates whether the given jv is deeply unshared. I think such a predicate would be very useful for testing jv_unshare().

MaxBrandtner commented 1 month ago

the checks to test whether the various subsections are not identical fail. Does jv_identical() not do what I think it does or what gives?

jv_identical() checks that if a has a pointer then b has the same pointer. Clearly jv_identical() will never return true(ish) for values that have pointers where one is a jv_unshare() deep copy of the other. Use jv_equal() for that.

and yet in the unit-test for jv_unshare() jv_identical() returns true once I start checking recursively and I don't know why

nicowilliams commented 1 month ago

~@MaxBrandtner jv_is_shared() is a better name, because that function can return faster than jv_is_unshared() can.~

MaxBrandtner commented 4 weeks ago

I've reduced the pr to jv_unshare() and jv_is_unshared().

jv_is_unshared() has some oddities (like internally calling jv_free() before jv_is_unshared() or checking for a refcnt 3 in the keys of jv_objects), but it works.

As it pertains to jv_unshare() it seems as if there was never an issue to begin with, but its test was simply incorrect