s-yata / marisa-trie

MARISA: Matching Algorithm with Recursively Implemented StorAge
Other
506 stars 89 forks source link

Add string_view getters and setters #27

Closed jmr closed 4 years ago

jmr commented 4 years ago

In Agent, Key, and Query; add functions to set the string using a std::string_view. Remove the const char * version.

Key::set_str(const char *) => set_str(string_view)
Query::set_str(const char *) => set_str(string_view)
Agent::set_key(const char *) => set_key(string_view)
Agent::set_query(const char *) => set_query(string_view)

Also add getters:

Key::str() -> string_view
Query::str() -> string_view

These are convenient because

  string s;
  agent.set_key(s.data(), s.length());

becomes

  string s;
  agent.set_key(s);

Setting with a const char * still works, since string_view has an implicit const char * constructor.

Similarly,

  string_view(agent.key().ptr(), agent.key.length()) == x

becomes

  agent.key().str() == x

std::string_view is a C++17 feature, so all changes are guarded with #if __cplusplus >= 201703L.

s-yata commented 4 years ago

I agree with that std::string_view version getters and setters are convenient. However, const char * version setters should not be replaced. If the argument type changes depending on __cplusplus, users will be confused.

For example, if libmarisa is built on C++17 or later, callers must be built on C++17 or later. If there is a conflict, the behavior is undefined.

jmr commented 4 years ago

I agree with that std::string_view version getters and setters are convenient. However, const char * version setters should not be replaced. If the argument type changes depending on __cplusplus, users will be confused.

There shouldn't be any confusion. You can still pass a const char * in C++17 mode, since string_view can be implicitly constructed from const char *. I didn't remove any test cases, so you can verify that this still works.

Maybe I should add a comment like, "Don't worry, you can still pass const char *"?

For example, if libmarisa is built on C++17 or later, callers must be built on C++17 or later. If there is a conflict, the behavior is undefined.

This is a good point. Shall I make it a configure option?

s-yata commented 4 years ago

How about adding accessors with new names, such as set_str_view()? We can easily avoid conflict problems.

jmr commented 4 years ago

How about adding accessors with new names, such as set_str_view()?

To me, it seems nicer to have the set_str overloads. What's the advantage of having both set_str(const char *) and set_str_view(string_view)? Just set_str(string_view) will do everything and have slightly smaller binary size.

glebm commented 4 years ago

If I understand correctly, C++ doesn't have that kind of stdlib ABI compatibility. If Marisa was linked against C++17 stdlib, it is not guaranteed to work with C++11 stdlib even if new C++17 stuff isn't used. Am I wrong?

jmr commented 4 years ago

If we make MARISA_USE_STRING_VIEW a config option it will work and save a lot of thinking.

The same version of gcc with different -std= options are compatible: https://stackoverflow.com/a/49119902

I think if we did something like

#if __cplusplus >= 201703L
void set_str(string_view);
#endif
void set_str(const char *);
void set_str(const char *, size_t);

it will even work with both versions, since we're just adding a non-virtual function and not changing the representation. I'm not really sure, since I haven't thought about it enough, though.

s-yata commented 4 years ago

If Agent::set_query(const char *) is inline, I could accept this pull request without worry.

If libmarisa is built on C++17 or later, Agent::set_query(const char *) does not exist. In this case, users must use C++17 or later, because Agent::set_query(const char *) is only declared. I believe it will confuse users.

jmr commented 4 years ago

If Agent::set_query(const char *) is inline, I could accept this pull request without worry.

Why is it better if it's inline?

If libmarisa is built on C++17 or later, Agent::set_query(const char *) does not exist. In this case, users must use C++17 or later, because Agent::set_query(const char *) is only declared. I believe it will confuse users.

Hmm. I fixed this, but now if the user build libmarisa with C++14 or earlier, trying to use set_str(string_view) (with -std=c++17 or equivalent, of course) will just result in a confusing link error (I think).

Maybe MARISA_USE_STRING_VIEW is really best?

s-yata commented 4 years ago

Why is it better if it's inline?

I'm sorry that I could not explain it well.

I would like to show you an example.

#ifndef ECHO_H
#define ECHO_H

#include <string_view>

class Echo {
 public:
#if __cplusplus >= 201703L
  std::string_view echo(std::string_view s) {
    return s;
  }
#else  // __cplusplus >= 201703L
  const char *echo(const char *s);
#endif
};

#endif  // ECHO_H
#include "echo.h"

#if __cplusplus < 201703L
const char *Echo::echo(const char *s) {
  return s;
}
#endif  // __cplusplus < 201703L
// main.cc
#include <iostream>
#include "echo.h"
int main() {
  Echo echo;
  std::cout << echo.echo("Mike") << std::endl;
  return 0;
}

If echo.cc is compiled with C++17 and main.cc is compiled with C++14, the linker fails due to undefined reference as follows:

$ g++ -c -std=c++17 echo.cc
$ g++ -c -std=c++14 main.cc
$ g++ echo.o main.o
main.o: In function `main':
main.cc:(.text+0x26): undefined reference to `Echo::echo(char const*)'
collect2: error: ld returned 1 exit status

This is because Echo::echo(const char *) does not exist in echo.o.

If the definition of Echo::echo(const char *) is provided in echo.h (inline), this problem does not happen.

Maybe MARISA_USE_STRING_VIEW is really best?

I think it's not a good way.

jmr commented 4 years ago

That makes sense, thanks.

2139f49 no longer has that problem, since I only add functions, not delete them, but I'll do something like your echo but with inline.

jmr commented 4 years ago

See if #31 is ok, then I'll do the inline version as you suggest.

jmr commented 4 years ago

I think the current version of this PR (without #31 ) works, with a tiny disadvantage of larger binary size. It's equivalent to this version of echo:

echo.h

#include <string_view>

class Echo {
 public:
#if __cplusplus >= 201703L
  std::string_view echo(std::string_view s) {
    return s;
  }
#endif
  const char *echo(const char *s);
};

echo.cc

#include "echo.h"

const char *Echo::echo(const char *s) {
  return s;
}
$ g++ -c -std=c++17 echo.cc && g++ -c -std=c++14 main.cc && g++ echo.o main.o && ./a.out
Mike

Can we proceed with this and decide whether/how to make Agent::set_query inline later?

s-yata commented 4 years ago

Sorry for my late response.

It looks good to me. Thank you!