Closed bsherin closed 3 years ago
Hi, although the latest v0.10.0 doesn't offer a way to do this, I'm considering adding a class method from_chat_str
to Reader
based on your question, so that you can do something like this:
import pylangacq as pla
reader = pla.Reader.from_chat_str(chat_data_str, encoding='utf-8')
where chat_data_str
is your CHAT data as an in-memory string (a string of what a single CHAT data file would be).
The current master branch on GitHub has just been updated with this class method. You may try it out by installing this dev version of pylangacq:
pip install git+https://github.com/pylangacq/pylangacq.git
Would you be able to let me know if this is what would work for your use case? If so, I'll make a release on PyPI so that this class method will be more readily available with pip install pylangacq
. Thanks!
Thanks! I’ll take a look, probably tomorrow afternoon. If you’re interested, the reason I’m looking for this capability is that I’m building a web-based system for social science data analysis that makes libraries available to researchers. Users/coders that use the system access data from an API, rather than a file system.
Here's a link: https://tactic.readthedocs.io/en/latest/index.html
That worked like a charm! The only thing that's missing is a way to load the equivalent of multiple strings, corresponding to multiple chat files. Something like the equivalent of the add
command would do the trick.
Some other thoughts for the future: I see that you made this work by writing a temporary file to the local file system. That made it possible for you make this change with only a small addition. On my system, however, the code run in a user's data analysis isn't really supposed to write anything to the file system. (It's a long story. Your new code still works in my system, there are just some small limitations.) When I originally looked at your code, I was looking to see if there was a way to keep the files around as something like StringIO
instances. But, it looked like that would require a lot of changes, distributed throughout the code.
So, for my very specific case, it would be a little helpful to not have the data written to the file system. But I'm not sure that other users would have a need. And the current version does 99% of what I need. Thanks again.
The only thing that's missing is a way to load the equivalent of multiple strings, corresponding to multiple chat files.
One workaround would be to create an empty Reader
instance and then add
the individual Reader
objects instantiated by a CHAT str. Something like this (not tested):
import pylangacq as pla
master_reader = pla.Reader()
for chat_str in chat_strs: # chat_strs is your container of CHAT strings
reader = pla.Reader.from_chat_str(chat_str)
master_reader.add(reader)
When I originally looked at your code, I was looking to see if there was a way to keep the files around as something like StringIO instances.
StringIO
did cross my mind yesterday evening when I tried to implement the from_chat_str
, but I went down the temp file path instead to get something operational real quick for you to test out. Using StringIO
should be possible -- the extra work would just be a bit of refactoring. Stay tuned for another update!
That workaround sounds pretty darn easy. I'll give it a try. Thanks!
A minor update. I just got around to trying the workaround suggested above. I think the last line needs to be master_reader.update(reader)
rather than master_reader.add(reader)
Apologies for the long silence. I've just released v0.13.0. I ended up rewriting the whole Reader
class, with a fair amount of breaking changes (changelog). The Reader
classmethod from_strs
reads CHAT data strings without hitting the disk, documentation here. I'm closing this issue as resolved. Please let me know if you have any questions.
Hi, I'm wondering if there's a way to take a chat file that exists as a string in memory, and to load it directly into a Reader instance. Thanks for any help.