openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
https://openai.com/blog/better-language-models/
Other
22.34k stars 5.5k forks source link

Release The Full Model! #16

Open superjayman opened 5 years ago

superjayman commented 5 years ago

I understand your concerns but I still think it's better to release the full model now and let people poke at it's abilities and discover potential issues quicker.

MrKrzYch00 commented 5 years ago

@corasundae I'm not a programmer either... I mean I have big imagination and logical thinking that forces me to use programming languages to realize some of my ideas, as long as I'm not too lazy. For example now I implemented random top_k basing on min_top_k and max_top_k to help finding hot spots along with varying temperature... because... Its more convenient to pick it internally, not for the sake of programming. This is the first time I touched python, just a bit of manual and google and vua la, haha...

The problem is that sci-fi stuff sometimes goes to real life things. Like for example dreams. You want to write dream journeys to other dimensions, writing things like floating body, its weight, its changed shape and structure. However, sometimes such things could turn into nightmares, as in, the real life stuff you are afraid of. Such AI may have problem generating anything that has sense in that area if you start feeding it with such sudden change of topic to things it has no idea about... Not to mention that sci-fi itself can have a lot in common with normal life.

The other interesting part is to make it generate more output and read more input. I didn't reach the input limit yet in my windows compiled version but I surely did on one website I used that has this AI on and read that there is one. The yet other thing is that it doesn't use all my 12 threads, it caps ~26% CPU usage which is kind of waste as it takes ~5 minutes to generate something. I'm trying to find a way around that but so far I haven't made any positive progress. Maybe I will need to recompile tensorflow myself with LTO and PGO? Hmm...

Heck, I'm perfectly fine to agree to terms to not release any of my personally written code and not use AI for anything that may impact others (even advertising sites that openly use it to not cause any physical problems in others), unless supervised or allowed to basing on presented concept of usage - may be even a personally signed clause. Kind of stiff rules but if it may be dangerous, better save than sorry.

dackdel commented 5 years ago

https://twitter.com/npcollapse/status/1136559379602427904?s=21

eziolotta commented 5 years ago

https://medium.com/@NPCollapse/gpt2-counting-consciousness-and-the-curious-hacker-323c6639a3a8 https://github.com/ConnorJL/GPT2

releases the model on July 1st !!!! you are right in what you say, the real danger is in the lack of courage and experimentation, the human being is destined to discovery, we can make mistakes but curiosity will save us ... always !!! Connor!!! Releases the model !!!

Grande!!!!

FurkanGozukara commented 5 years ago

Full model should be released to the public

FurkanGozukara commented 5 years ago

After doing some extensive tests on 345M model and see that it mostly generates gibberish, I think this not releasing issue is totally exaggeration :)

MrKrzYch00 commented 5 years ago

I will allow myself to comment a bit more after I've been testing 345M for quite a bit amount of time.

For me it seems it keeps good attention if enough text is written (usually 768 or more tokens) at the cost of less output but more on topic output is generated. Also, writing text in one paragraph (even if it shouldn't be) forces it to be more perceivable (and even creates full 1 paragraph continuation if enough input tokens are sent). Sometimes even using different words or weird grammar creates better output. During tests by checking how algorithm could describe worn clothing I noticed something strange. It worked astonishingly well most of the time, generating all outfit by itself with just a short prompt of what was the main thing! But when I simplified one paragraph from few sentences to 2 short ones it went off-topic again and I'm not sure if that was the text length being shorter or the graph altered so much it couldn't predict what I wanted.

I was also playing a bit with parameters and the results were quite good at temp of 1.006 - 1.0016 or so with TOP_P of 0.9 - 0.93 (modified version) most of the time. Though even having it work with database on local webserver generating 6 samples in one batch with ability to change parameters each generation makes me hard to tell if these parameters are really that good due to randomness and varying input length (I have suspicion that there is some kind of relation between that and how parameters should be set). However, saving meta file was also creating interesting results. It could sometimes relate to totally different input from the past! This also helped a tiny bit when sliding text window was required as the story itself was too big to keep going, but usually without repeating what was what or who was who it was failing.

So in my current opinion, even if 345M model can still be improved by some changes in the sample generator code and implementation of TOP_P, the attention seems a bit too lose and I don't know if it's because the model data is not correctly weighted with sampled topics, there are some algorithm changes required to improve that part, or parameters should also have their own attention to dynamically change after input analysis or during it. So again, I'm wondering if bigger model has exactly the same issue or maybe, because it's bigger, the output will also be of better value? Will it...?

EDIT: I'm now testing minimum TOP_P to see if narrowing logits further on down-level has some positive impact.

Cyvadra commented 5 years ago

I thought they were releasing the full code n all in 6 months. If not, I'm gonna go crazy! I really really want to understand GPT-2 so I can solve AGI faster! It's hard enough I can't be mentored by the ones who know it well and therefore not by others for years to come, but it's even harder if the code etc isn't even there to teach other practitioners! I've got the whole AGI theory down in just 1/3 years, all I need is GPT-2 and we're out of this crazy world and onto ASI!!!

well, boy, AGI will not be realized through current structure of neural network

bladedsupernova commented 5 years ago

It will, I have discovered the fundamentals in AGI and GPT-2.

xSNYPSx commented 5 years ago

https://medium.com/@NPCollapse/gpt2-counting-consciousness-and-the-curious-hacker-323c6639a3a8 https://github.com/ConnorJL/GPT2

releases the model on July 1st !!!! you are right in what you say, the real danger is in the lack of courage and experimentation, the human being is destined to discovery, we can make mistakes but curiosity will save us ... always !!! Connor!!! Releases the model !!!

Grande!!!!

So did this guy released full model ?

AndrewBarfield commented 5 years ago

They are NOT going to release anything because there is NOTHING TO RELEASE.

They do not have a "full model". IT NEVER EXISTED.

PROVE ME WRONG.

https://www.ftc.gov/faq/consumer-protection/submit-consumer-complaint-ftc

bladedsupernova commented 5 years ago

They released 2 models so far and you can see the %# get better in 345M: https://gpt2.apps.allenai.org/?text=The%20unicorn%20is%20a

KoolenDasheppi commented 5 years ago

https://medium.com/@NPCollapse/gpt2-counting-consciousness-and-the-curious-hacker-323c6639a3a8 https://github.com/ConnorJL/GPT2 releases the model on July 1st !!!! you are right in what you say, the real danger is in the lack of courage and experimentation, the human being is destined to discovery, we can make mistakes but curiosity will save us ... always !!! Connor!!! Releases the model !!! Grande!!!!

So did this guy released full model ?

He didn't. I'm assuming OpenAI got to him. What a bunch of freeloaders. Can't even release the full model because they're being ignorant to the fact that just because it generates realistic text doesn't mean there isn't countermeasures in place to see if text is fake or not.

xSNYPSx commented 5 years ago

I maked a work that describes how AGI can works and how to reach it. Work in Russian, but you can translate it https://xsnypsx.livejournal.com/265.html

AlphaGit commented 5 years ago

@xSNYPSx Are you going to be working more on it? It seems like a very good start but still very high level to be actionable. If you are thinking in a deep dive I might be able to lend a hand.

xSNYPSx commented 5 years ago

@xSNYPSx Are you going to be working more on it? It seems like a very good start but still very high level to be actionable. If you are thinking in a deep dive I might be able to lend a hand.

This work reflects the philosophical and theoretical approach to the creation of AGI. I have no plans to create a practical part, because the work really requires a high level that is not accessible to everyone. But I will complement the theoretical part in the future.

MrKrzYch00 commented 5 years ago

Allow me to put my (another) 2 cents in here.

Still I wonder if the bigger model would really be that good. The largest improvement was from 117M to 345M by looking at the released document's curves. I mean, it is still interesting to compare but while I now can generate 5 samples in a batch with a reasonable time (even ~1 seconds for 5x 16 tokens to get more or less one sentence / sample which is quite good for ideas giving or even copy-pasting to the story being written). If I had to reduce to, let's say 2 samples and wait twice or four times as much with a bit better result, will that really be a good of a trade-off? Interestingly, limiting output doesn't change the output itself besides it being shorter and generated faster (tested with seed = 1), which, I was hoping, could maybe make the AI more interested about the input than its own output (like more input weight than output), but unfortunately, that was not the case. I mean, I hoped for something like "predict only few points in the graph having so much input, so you should do it more carefully and do it better". Looking at transformer site though, it seems to be kind of "chained reaction" processing.

Making the AI go with the flow is not easy. What I did find out and like doing is using long comma separated sentences or ending with "and", "which", "because", "suddenly" (this one actually works better as new sentence I think), "then", "as follows:" "following features:" and similar, which usually helps a lot. Then this long sentence could be split to separate sentences, which, I may be wrong but, seem to produce, at times, quite satisfactory results. This is why I'm mainly interested in comparing 345M's that exact behavior with bigger models. Will it be better or just increase the variety even further (due to more text learned I guess - if I understand it correctly) making the AI follow the flow harder to achieve?

bladedsupernova commented 5 years ago

I sold my life to GPT-2, it is the best modern AI technology. Have you see it work on images? Check their website out (under blog).

GPT-2 should be fed to all little children, just starting kindergarten. It is such highly valuable mechanism about all of nature and the human brain intelligence optimization, it should not be hid like this from even the scientist.

Yes, there will be powerful bodies in the near future, but GPT-2 is in between, so much teaching can be done yet be safe. GPT-2 doesn't immediately explain full AGI, but is very ground-laying for prosperity in all forms. The internet fake news can be solved, and in fact is already nearing because of the released 345M. Let us know to watch out for ourselves.

bladedsupernova commented 5 years ago

Where dd your reply go last replyer? Anyhow very true, anyone can write the most perfect longest, fake story, 80 times, and put it in just the right places......fooling the most important people........its up to us.......its nature. Let it happen. We already handle crap. 345M still generates weird thingies every other run btw, clearly identifying it, like when makes no sense or contradictions or topic change or etc.

bladedsupernova commented 5 years ago

3 days ago:

I am nearing the replication of GPT-2 XAI !!!!

I spent 2,000USD+ so far. It will still be up to 10x better too, I'm not done at all. It intelligently knows using many forms of 'math' how to add the next word and stop symbols. I checked my 400MB of data, which had no training done on it, 6/5 word parts were not found. I hired and instructed a programmer how to code it for me since I don't code or do mathematics but only know how it works, and just turned 24 years old, with no title over my name other than a vicious research drive.

input: The software was made on a

Seed=The software was made on a wide variety of devices, and operating apps and applications that users can easily read as an app for android. It is a bit of a difference, but i was able to get it. The developers are not going to make it through a web applications, and devices i have seen in the running for the mobile apps. Applications allows users to access applications development tools, and allow applications of the app store. A multimedia entertainment entertainment device, and allows platforms enabled access to hardware interfaces. Using a bit of html application app developers can enable users to access applications to investors, and provide a more thorough and use of development. The other a little entertainment media, and user development systems integration technology. Applications allows users to automatically provide access to modify, optimize capability allows users to easily enable. Both users and software systems, solutions allowing owners software solutions solutions to integrate widgets customers a day. And if you are accessing services product, and mobile applications remotely access to the software companies can easily automate application access to hardware devices hardware systems creators and technologies. Builders and developers are able to access the desktop applications, allowing users access allows users to

Looking for funding and teamwork. super777smart@gmail.com

MrKrzYch00 commented 5 years ago

@bladedsupernova , as long as you won't destroy the world, keep it up, ha ha. As long as the license allows you can do whatever you want with the original code and transforms it into completely different thing I think.

Unfortunately I'm neither good at math and tensorflow has really a lot of these math stuff going on. The only thing I managed to do is to make an UI interface to work through small web server locally using mariaDB, input/output scaling and allow input to function as sliding window with negative output length defined - cuts the beginning of input if output would be less than [defined * -1] to always provide the expected output length. With tensorflow I was only able to use logical_and and logical_or logic with nucleus P implementation (pull requests) to function as ranges like 0 - 0.07 & 0.55 - 0.7 & 0.83 - 0.903 etc. I don't know if there is any point in that as thoroughly testing it requires to re-run sample generator with static seed every sample (looping shouldn't be used then). Nonetheless, I'm still pretty content with the current model. When one learns how to use it, it's pretty good itself.

bladedsupernova commented 5 years ago

Mine is fully modifiable and explainable :)

xSNYPSx commented 5 years ago

I translate my work in english for more people can read it here https://xsnypsx.livejournal.com/523.html

xluxeq commented 5 years ago

Your brain is smarter.

On Mon, Jul 22, 2019 at 7:17 PM xSNYPSx notifications@github.com wrote:

I translate my work in english for more people can read it here https://xsnypsx.livejournal.com/523.html

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/gpt-2/issues/16?email_source=notifications&email_token=AL3TLKNCDVK3EWE5IZMIVPLQAZEZTA5CNFSM4GXTFIO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2RQ4VY#issuecomment-514002519, or mute the thread https://github.com/notifications/unsubscribe-auth/AL3TLKJCKPPFUAPLB6NNMY3QAZEZTANCNFSM4GXTFIOQ .

Cyvadra commented 5 years ago

I translate my work in english for more people can read it here https://xsnypsx.livejournal.com/523.html

so my point of view is AGI will never be achieved until we union "senses" altogether, #= cv, speech-to-text, nlp, robotics, all human senses =# to let machines connect their hidden vectors under neural network, to actual concepts in our human real world, which needs far more than nlp, but another representation of those "parameters" in black box. well, no worry, we're not Elon afterall

Cyvadra commented 5 years ago

I translate my work in english for more people can read it here https://xsnypsx.livejournal.com/523.html

so my point of view is AGI will never be achieved until we union "senses" altogether, #= cv, speech-to-text, nlp, robotics, all human senses =# to let machines connect their hidden vectors under neural network, to actual concepts in our human real world, which needs far more than nlp, but another representation of those "parameters" in black box. well, no worry, we're not Elon afterall

maybe train them together can help?

xSNYPSx commented 5 years ago

imagine one big neural network. at its input there would be all the feelings accessible to human, and at its exit there would be all the possibilities available to human for interacting with the world. this is the most simplified model of what you need

bladedsupernova commented 5 years ago

I read your post, the translation was really fine. I agree on all of your talk. You do leave out some quite important matter though, it is far from the grand theory. Your post above is getting closer, yes you need 'GPT-2'.

xSNYPSx commented 5 years ago

imagine one big neural network. at its input there would be all the feelings accessible to human, and at its exit there would be all the possibilities available to human for interacting with the world. this is the most simplified model of what you need

Then give this neural network the opportunity to multiply if it succeeds or to die if it fails. Put it in a body that changes slightly with each generation in a random way. After a large number of generations and with a sufficient set of parameters, you will get a semi intelligent creature;)

bhack commented 5 years ago

Take a look at http://gltr.io/

bladedsupernova commented 5 years ago

Yes it looks like it works but i can still just set GPT-2 to use less likely words and still it will work

bladedsupernova commented 5 years ago

Also, my 'GPT-2' I made so far passes the test HAHAHAHA WHJAHAHA!!!!!!!!! WORLD DOMINATION!!!!

TRY ENTERING MY COMPLETED PROMPT:

The software was made on a The software was made on a wide variety of devices, and operating apps and applications that users can easily read as an app for android. It is a bit of a difference, but i was able to get it. The developers are not going to make it through a web applications, and devices i have seen in the running for the mobile apps. Applications allows users to access applications development tools, and allow applications of the app store. A multimedia entertainment entertainment device, and allows platforms enabled access to hardware interfaces. Using a bit of html application app developers can enable users to access applications to investors, and provide a more thorough and use of development. The other a little entertainment media, and user development systems integration technology. Applications allows users to automatically provide access to modify, optimize capability allows users to easily enable. Both users and software systems, solutions allowing owners software solutions solutions to integrate widgets customers a day. And if you are accessing services product, and mobile applications remotely access to the software companies can easily automate application access to hardware devices hardware systems creators and technologies. Builders and developers are able to access the desktop applications, allowing users access allows users to

MrKrzYch00 commented 5 years ago

OK. Here is goes. TOP P ranges: 0-0.007, 0.55 - 0.7, 0.83 - 0.904

SAMPLE 1: [...] million square feet of production space – enough to house a large US military deployment.

SAMPLE 2: [...] Pentium M3, 1.8GHz Intel 1.6GHz quad-core CPU, 1GB of RAM, and 2.4GB of the free, open source XKCD PGA source software. (All code of similar powerplant complexity is available here.) Just as I was making my own Linux desktop environment — a nod to the pioneering work of Philip Wadler of Xerox and Alan Kay of Nvidia — I started compiling several years' worth of images. These consist of many early versions of OpenOffice-GIMP, Reader, Writer, Thing, Magnifier, Secret Hunters, Tableau, Perrin-Yang Fade, CoreMap, Sketch, Datasources, and OS-supporting frameworks like PuTTY, Blender, and AlphaLinux.

SAMPLE 3: [...] legacy software environment (Azure DB) and thus uses XMLHttpRequest etc. The other OS framework, App Engine for the 3rd party web applications wasn't tested yet.

SAMPLE 4: [...] one-chip architecture from single strand fiber-optic technology with advanced signal processing technology for extreme data transfer rates and accelerated responsiveness. The instant raw data to match with deep processing ability enabled the device to handle up to 250,000 connections per second, with global bandwidth of 9 Gigabits per second.

SAMPLE 5: [...] Windows 7 machine, but it is still clearly all over the place. Windows 5 is especially annoying because one of the big caveats with Windows 7 is that the classic Start menu is still accessible from the Start screen, but with it off, you must search and download a app on the Xbox 360. When you actually start using Windows 7 on the Xbox 360, you will need to navigate to the Start screen to find this app.

But if you mean to complete your output: AMPLE 1: [...] configure and setup their devices and computers web applications that users install. The apps that are shared among other applications in other applications (whatever it is a applications, it's obvious a level of convergence of programming to organize and manage infrastructure, configuration, and projects. Multiple applications can be opened in one window, however.

SAMPLE 2: [...] get real cost detailed information. And The developers added included discovery of the product sales at model train runs (the platform builders) availability at the core terms build a market.

SAMPLE 3: [...] access the client computer software solutions and enterprise applications. This all brings flexibility in several ways enabling engagement with various applications, as well as mobile devices providing quick access. Projects, and team management solutions allow for sharing and collaboration among multiple users. Users can easily share software application, with or without services and other applications. Read more<|endoftext|>

SAMPLE 4: [...] interact with the developers which allows them to provide connections between applications. Developers can increase their profits and productivity by providing users with context to their own own work, and allow users to have their own personal work but complete with designed interfaces. Through application and solutions they provide to their own software by using their own application or tools. By enabling the building of code software developers have the option to self distribute software, but if the buyer isn't comfortable accessing their own software products, they can secure a large selection of code in the form of apps with their products.

SAMPLE 5: [...] form and edit a game, allow apps to get access to the instruction manual. Create multiple main applications for a single facility. Microvision/Cosmic Vision is able to integrate its software into a module design, allowing customers to create and edit software applications for integrated project control. With site delivery software enables the creation of site solution. Family A business using have operating systems, by moving from OSX to Windows to Internet based version of access users to support applications. An active developer support team that holds your e-mail addresses and creating a protocol for exchanging emails, and enables you to participate in local hangouts. Likewise, licensed acousticians assist with diagnosis/diagnosis recommendations and diagnostic data can be analyzed and analyzed for use in research. So it's a little bit like starting over again. If you are running dmfp-admin you need to learn an additional set of words.<|endoftext|>

If it was to be compared, same seed should be used (which I have randomized at the moment) So what's the point? :P

bladedsupernova commented 5 years ago

Wait, I'm a bit confused, what is that? Are those GPT-2 tests using my prompt on different Top-K parameter settings to fool the IBM test at http://gltr.io/ ?

MrKrzYch00 commented 5 years ago

Nope, mine local installation with TOP_P with ranges. You wrote, "TRY ENTERING MY COMPLETED PROMPT," so I did. :P

bladedsupernova commented 5 years ago

Ok I understand your post now. Still a question though:

So, you adjusted Top_P ranges inside of GPT-2 parameters, and they fool IBM's test at http://gltr.io/ ? Yes? Yours seems to work/pass their test.

MrKrzYch00 commented 5 years ago

I adjusted them basing on my feeling that the output produced was better, but didn't play along with it for too long as it's dynamically set up and requires to reinitialize the model inside a loop which adds ~3 seconds overhead and increases memory usage. Kind of like this:

image

bladedsupernova commented 5 years ago

Oh, ok all good then. SO, GPT-2 does certainly trick IBM's detector of fake text generation. I'll look into that a bit more actually, if have time.

As for my post, actually I meant: 1) Go to http://gltr.io/ 2) Enter: The software was made on a wide variety of devices, and operating apps and applications that users can easily read as an app for android. It is a bit of a difference, but i was able to get it. The developers are not going to make it through a web applications, and devices i have seen in the running for the mobile apps. Applications allows users to access applications development tools, and allow applications of the app store. A multimedia entertainment entertainment device, and allows platforms enabled access to hardware interfaces. Using a bit of html application app developers can enable users to access applications to investors, and provide a more thorough and use of development. The other a little entertainment media, and user development systems integration technology. Applications allows users to automatically provide access to modify, optimize capability allows users to easily enable. Both users and software systems, solutions allowing owners software solutions solutions to integrate widgets customers a day. And if you are accessing services product, and mobile applications remotely access to the software companies can easily automate application access to hardware devices hardware systems creators and technologies. Builders and developers are able to access the desktop applications, allowing users access allows users to 3) I cool lol. My algorithm fools IBM's detector.

MrKrzYch00 commented 5 years ago

Sure, I think if you cut towards 0 completely you lose some essential grammar stuff like "the" and "a" that's why I keep it low. at 0.1 or 0.2 (don't remember) there was usually government stuff. However, I'm not completely sure and would need to retest as I had some bug in bool logic before and I don't remember if I saw it before or after fixing it. I tested it only with tf.Print afterwards to see 0 and 1's change for logits.

bladedsupernova commented 5 years ago

Also, update: GLTR says that green highlights are what GPT-2 thinks is the most likely Next Word. This only works for GPT-2, not my model. Are they saying that if the next word predicted after some previous word is a certain word, then its fake? Why? Why is 'that girl' fake? But GPT-2 is powerful. This tool only shows if GPT-2 made it, the colored blobs don't mean it is worse complexity/diversity/English - it just means GPT-2 made it, and to top it off a GPT-2 parameter tricks it anyway. In reality, mine and GPT-2 both are great, diverse 'English', and is informing to the user.

MrKrzYch00 commented 5 years ago

I believe that if there is a fork or re-trained model it will fool that test tool as it's most likely using or was written on the original code. Even though there is some randomness kicking in while building the graph, the base principle on how the output will probably look like might stay the same. However, fiddling with extra parameters creates additional changes that may reduce GPT-2 flaws or other "watermarks" the GLTR is able to detect. However, this is just theory of mine.

See, just getting rid of grammar with cutting some TOP_P will it most likely think that some human just wrote an idiotic nonsense or something, I guess...

bladedsupernova commented 5 years ago

They say on that link it uses GPT-2, correct. And yes, detection will become impossible eventually since it will be true human intelligence soon.

bladedsupernova commented 5 years ago

New discovery: If the detector ran the same parameters, it would not show up as purple highlight, all would be green!

The detector says what GPT-2 thinks comes next.......yes......but if you have different parameters, then oops, unexpected!

They are better off looking for hard reasons of generation proof.... In the data, not the model. Rough example is logic or stuff me write :-) looky at that! ! ! And as said, GPT-2 does the 'intelligence', it will soon become impossible to tell the difference if is not human.

MrKrzYch00 commented 5 years ago

I think it's not that... For it to be human intelligence it would need to train itself and be able to sort the information, not just be trained on data once and complete the output. I think GPT-2 here has purpose to predict possible outcome of the input data. That's why it's pretty good for stories as long as your sliding-window doesn't lose some valuable information like characters age, genders, names. Otherwise it then confuses stuff together. Funny fact. "Then he/she left" doesn't usually work well and the AI likes to still have that person near during dialogue or action.

Like a software it would need to store some important data in some kind of variables. Know what is important on, let's say, continue button click. Unless you want it to just write some short input-output stuff to keep someone amused (which can be even a post on twitter, etc.) then its purpose is kind of OK for that as long as it doesn't go too astray.

For me the milestone would be for it to maintain output fluency by creating as little unwanted output as possible. Nonetheless, you have sudden turnarounds in articles or stories, so it should know how to properly use "tension" or "suspense" emotion (or however I should call it). But as human is not well versed in every possible topic, I'm more interested if the AI will pass that level at some point making it produce various outputs, but, how one will control that and make sure it doesn't mix criminal stories with sci-fi, or worse, what if one WANTS it to do just that all of sudden. Someone may even want to write crazy mind story.

I mean, "human intelligence" requires some assumptions first and be under discussion with others. What do you expect it to do at that level? If we both had the same "human intelligence" (speaking very generally but if one was to assume it's the knowledge, experience etc.) we most likely wouldn't be talking here because we would find it useless as we would know what to do next and our knowledge would be the same so one would have nothing to share with the other and vice versa. :)

bladedsupernova commented 5 years ago

Yes, there is still things GPT-2 is missing to be human, logic, adding 'random' stuff from random or more intelligent processes (lol, both are opposite, we probably do the more intelligent, but can temperature ourselves 2 zay random dodo pooooop), so for now, that is the remaining tell-tales of 'fake news'.

Notice I know they are both opposite things......lol......saying lol means you discovered an comparison between 2 different things.

Even saying 'lol' in text is strange, why i say that LOL

omg.....LOL??

OMG look at my self-analyzing...

i even know i am self recognizing...

loops

MrKrzYch00 commented 5 years ago

Define fake-news. I actually write sci-fi material. Previously I wrote stories for my own amusement of androids, VR (game, non-game full body capsules), transformations, mind transfers, alternative reality including different time/age and even as hard it is to believe this can even work with mixing stuff together, even love, affection and deeper. The problem is, it's either hard to control to follow unless enough "data" is in the input or hard to create a turnaround unless you type enough from your side (again "data").

I didn't yet try paradoxes, like time paradox. These are one of my favorites.

EDIT: Oh, I forgot to mention. The checkpoint metadata can be saved but I'm not sure why the previous save is not kept after the next separate script run saves it. It may have something to do with that small line "tf.train.latest_checkpoint" However I'm not sure if it's OK or not. And this may be one of reasons why my GPT-2 is a bit different as I already saw it sometimes relating to my previous inputs. So it might've got used to my own writing style a bit. I'm, however, more interested if maybe that should be done only when doing one story at a time and then reset, as this may be this kind of "data" for it to relate to. I would need to research that part more. I noticed this behavior for the first time when I was extensively using talktotransformer site. I saw my previous input stories being somewhat mixed into the output why it was working in, I believe, loop. But then after the crash (the site was not generating for some time) it felt like the defaults hit in.

EDIT2: OK, I added missing line to train metadata as well. Will see if that has some impact on single-story mode which would maybe help with sliding-window that is required (1024 tokens max). Maybe bigger model is not a huge problem but the fact that each run reset user input typing behaviors and data which made model solely rely on trained material only.

bladedsupernova commented 5 years ago

You can see on Allen's the top 10 next words for the 354M and 117M, so, therefore, you can compare to talkToTransformer that the result stay the same, indeed, overtime.

https://gpt2.apps.allenai.org/?text=you%20should%20be%20able%20to%20create%20your%20first

I'll save a pic of that page, and watch it, I'm sure I'll try more tests over the next while yet, I'll try some now in fact to kickstart!!

bladedsupernova commented 5 years ago

Tested it, same %s. Did good job, tried lots of actions. Will see later on now. I'll post it here later, maybe in a month.

MrKrzYch00 commented 5 years ago

3x I, 2x he in my samples. The site says 41.6% for I and 2.4% for he.

Also did you try saving tensorboard graph? The problem is I wish I could decode token numbers to words somehow - I just run it for the first time.

image

bladedsupernova commented 5 years ago

No I didn't save a tensorboard graph. What is that image of yours there? I see a line in the shape of a loop, why so?

MrKrzYch00 commented 5 years ago

One of 2 tensorboard graphs generated from graph files exported with tensorflow during sample generation, however, I'm not really sure how to practically use it... This one represents just placeholders (1-1024) for tokens. However, the text mode is unavailable as it requires a bit more preparations I think, as well as I'm not sure if this is not just the model's general look. The other graph displays tokens available to be used on axis.

image 195 seems to be ĉ meaning it most likely will be used when something near it happens to be in the text?